ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
12.93k stars 2.29k forks source link

Inconsistent results between two successive runs #1982

Open KamarajuKusumanchi opened 1 month ago

KamarajuKusumanchi commented 1 month ago

Describe bug

The OHLC data returned by yf.download() is not reproducible across multiple runs. For example, if I download OHLC data of SPY two times, the numbers are slightly different. Ideally, the numbers should be the same.

Simple code that reproduces your problem

Consider the following code. It downloads daily OHLC data for SPY and writes it to a file.

 % cat sp500_daily_ohlc.py 
import yfinance as yf
from datetime import datetime, date

df = yf.download(["SPY"], date(2023, 1, 1), date(2024, 7, 2))

time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)

Run the code twice

 % python sp500_daily_ohlc.py
[*********************100%%**********************]  1 of 1 completed
writing data into daily_20240713_193451.csv
 % python sp500_daily_ohlc.py
[*********************100%%**********************]  1 of 1 completed
writing data into daily_20240713_193457.csv

Ideally, these two files should be the same. But they are not.

 % diff daily_20240713_193451.csv daily_20240713_193457.csv | wc -l
466
rajulocal@hogwarts ~/work/github/market_data_processor/src/inprogress
 % diff daily_20240713_193451.csv daily_20240713_193457.csv | head -n 20
2,6c2,6
< 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7543029785156,74850700
< 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.63201904296875,85934100
< 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.34478759765625,76970500
< 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8604736328125,104189600
< 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451721191406,73978100
---
> 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7542419433594,74850700
> 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.6319885253906,85934100
> 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447570800781,76970500
> 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8605041503906,104189600
> 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451416015625,73978100
8,10c8,10
< 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.1429138183594,68881100
< 2023-01-12,396.6700134277344,398.489990234375,392.4200134277344,396.9599914550781,388.5523986816406,90157700
< 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.0597839355469,63903900
---
> 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.14288330078125,68881100
> 2023-01-12,396.6700134277344,398.489990234375,392.4200134277344,396.9599914550781,388.55242919921875,90157700
> 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.059814453125,63903900

Debug log

Code with debug mode enabled

 % cat sp500_daily_ohlc.py 
import yfinance as yf
from datetime import datetime, date

yf.enable_debug_mode()

df = yf.download(["SPY"], date(2023, 1, 1), date(2024, 7, 2))

time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)

Output on the first run

 % python sp500_daily_ohlc.py
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       SPY: Yahoo GET parameters: {'period1': '2023-01-01 00:00:00-05:00', 'period2': '2024-07-02 00:00:00-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/SPY
DEBUG        params=frozendict.frozendict({'period1': 1672549200, 'period2': 1719892800, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'})
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          loaded persistent cookie
DEBUG          reusing cookie
DEBUG          crumb = 'tz63UYSUdiS'
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       SPY: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-01 13:30:00
DEBUG       SPY: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-01 09:30:00-04:00
DEBUG       SPY: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-01 00:00:00-04:00
DEBUG       SPY: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-01 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
writing data into daily_20240713_194042.csv

Output from the second run

 % python sp500_daily_ohlc.py
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       SPY: Yahoo GET parameters: {'period1': '2023-01-01 00:00:00-05:00', 'period2': '2024-07-02 00:00:00-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/SPY
DEBUG        params=frozendict.frozendict({'period1': 1672549200, 'period2': 1719892800, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'})
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          loaded persistent cookie
DEBUG          reusing cookie
DEBUG          crumb = 'tz63UYSUdiS'
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       SPY: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-01 13:30:00
DEBUG       SPY: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-01 09:30:00-04:00
DEBUG       SPY: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-01 00:00:00-04:00
DEBUG       SPY: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-01 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
writing data into daily_20240713_194050.csv

Differences

 % diff daily_20240713_194042.csv daily_20240713_194050.csv | wc -l
452
 % diff daily_20240713_194042.csv daily_20240713_194050.csv | head -n 20
2c2
< 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.75421142578125,74850700
---
> 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7542724609375,74850700
4,5c4,5
< 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447265625,76970500
< 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8604431152344,104189600
---
> 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447570800781,76970500
> 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8605041503906,104189600
7c7
< 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.30755615234375,65358100
---
> 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.3075256347656,65358100
11,13c11,13
< 2023-01-17,398.4800109863281,400.2300109863281,397.05999755859375,397.7699890136719,389.3452453613281,62677300
< 2023-01-18,399.010009765625,400.1199951171875,391.2799987792969,391.489990234375,383.1982116699219,99632300
< 2023-01-19,389.3599853515625,391.0799865722656,387.260009765625,388.6400146484375,380.40869140625,86958900
---
> 2023-01-17,398.4800109863281,400.2300109863281,397.05999755859375,397.7699890136719,389.34521484375,62677300

Bad data proof

No response

yfinance version

0.2.40

Python version

3.12.3

Operating system

Debian GNU/Linux 12 (bookworm)

ValueRaider commented 1 month ago

Try the branch in PR #1984 (how to run)

KamarajuKusumanchi commented 1 month ago

Try the branch in PR #1984 (how to run)

I tried it. It did not fix the issue. Consider the following code

 % cat sp500_daily_ohlc.py
import yfinance as yf
from datetime import datetime, date

auto_adjust = True
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=auto_adjust)

time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_auto_adjust_{auto_adjust}_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)

I ran this twice. Then I changed auto_adjust to False

 % cat sp500_daily_ohlc.py
import yfinance as yf
from datetime import datetime, date

auto_adjust = False
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=auto_adjust)

time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_auto_adjust_{auto_adjust}_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)

I ran this twice. The files are different across all the four runs.

 % md5sum daily_auto_adjust_*.csv
f470fbd816e09be9037edef54a0f4e59  daily_auto_adjust_False_20240714_142500.csv
001d65b7fa991f4d3aa2c1b8a99cd12d  daily_auto_adjust_False_20240714_142503.csv
ee0e0a82eb6f7307e3a540f3cf30da26  daily_auto_adjust_True_20240714_142435.csv
3b1ff5053c992fcd5965230c96333dbd  daily_auto_adjust_True_20240714_142442.csv
 % diff daily_auto_adjust_False_20240714_142500.csv daily_auto_adjust_False_20240714_142503.csv | head -n 20
2,8c2,8
< 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7542724609375,74850700
< 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.63201904296875,85934100
< 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3446960449219,76970500
< 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8604431152344,104189600
< 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451416015625,73978100
< 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.3075256347656,65358100
< 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.1428527832031,68881100
---
> 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7543029785156,74850700
> 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.6319580078125,85934100
> 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447570800781,76970500
> 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8605041503906,104189600
> 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451110839844,73978100
> 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.30755615234375,65358100
> 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.14288330078125,68881100
10c10
< 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.0597839355469,63903900
---
> 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.059814453125,63903900
 % diff daily_auto_adjust_True_20240714_142435.csv daily_auto_adjust_True_20240714_142442.csv | head
2c2
< 2023-01-03,376.2290102149442,378.2453768730092,369.8275195343226,372.75421142578125,74850700
---
> 2023-01-03,376.2290410170059,378.2454078401519,369.827549812291,372.7542419433594,74850700
4,5c4,5
< 2023-01-05,373.635161717404,373.7526153349045,370.73786295793764,371.3447265625,76970500
< 2023-01-06,374.50629665205355,381.0056756304061,371.3740906518094,379.8604431152344,104189600
---
> 2023-01-05,373.63519242321297,373.7526460503659,370.73789342564294,371.3447570800781,76970500
> 2023-01-06,374.5063267394853,381.00570623999096,371.37412048760314,379.8604736328125,104189600
ValueRaider commented 1 month ago

You should have seen a new warning message.

KamarajuKusumanchi commented 1 month ago

You should have seen a new warning message.

Yes, I see the warning message. But it does not help with the reproducibility. For example

df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2))

gives the warning message. But neither

df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=True)

nor

df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=False)

give results that are reproducible.

ValueRaider commented 1 month ago

It's not meant to fix the reproducibility. Adj Close isn't consistent from Yahoo.