Open KamarajuKusumanchi opened 4 months ago
Try the branch in PR #1984 (how to run)
Try the branch in PR #1984 (how to run)
I tried it. It did not fix the issue. Consider the following code
% cat sp500_daily_ohlc.py
import yfinance as yf
from datetime import datetime, date
auto_adjust = True
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=auto_adjust)
time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_auto_adjust_{auto_adjust}_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)
I ran this twice. Then I changed auto_adjust to False
% cat sp500_daily_ohlc.py
import yfinance as yf
from datetime import datetime, date
auto_adjust = False
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=auto_adjust)
time_stamp = datetime.now().strftime("%Y%m%d_%H%M%S")
file_name = f"daily_auto_adjust_{auto_adjust}_{time_stamp}.csv"
print(f"writing data into {file_name}")
df.to_csv(file_name)
I ran this twice. The files are different across all the four runs.
% md5sum daily_auto_adjust_*.csv
f470fbd816e09be9037edef54a0f4e59 daily_auto_adjust_False_20240714_142500.csv
001d65b7fa991f4d3aa2c1b8a99cd12d daily_auto_adjust_False_20240714_142503.csv
ee0e0a82eb6f7307e3a540f3cf30da26 daily_auto_adjust_True_20240714_142435.csv
3b1ff5053c992fcd5965230c96333dbd daily_auto_adjust_True_20240714_142442.csv
% diff daily_auto_adjust_False_20240714_142500.csv daily_auto_adjust_False_20240714_142503.csv | head -n 20
2,8c2,8
< 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7542724609375,74850700
< 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.63201904296875,85934100
< 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3446960449219,76970500
< 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8604431152344,104189600
< 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451416015625,73978100
< 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.3075256347656,65358100
< 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.1428527832031,68881100
---
> 2023-01-03,384.3699951171875,386.42999267578125,377.8299865722656,380.82000732421875,372.7543029785156,74850700
> 2023-01-04,383.17999267578125,385.8800048828125,380.0,383.760009765625,375.6319580078125,85934100
> 2023-01-05,381.7200012207031,381.8399963378906,378.760009765625,379.3800048828125,371.3447570800781,76970500
> 2023-01-06,382.6099853515625,389.25,379.4100036621094,388.0799865722656,379.8605041503906,104189600
> 2023-01-09,390.3699951171875,393.70001220703125,387.6700134277344,387.8599853515625,379.6451110839844,73978100
> 2023-01-10,387.25,390.6499938964844,386.2699890136719,390.5799865722656,382.30755615234375,65358100
> 2023-01-11,392.2300109863281,395.6000061035156,391.3800048828125,395.5199890136719,387.14288330078125,68881100
10c10
< 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.0597839355469,63903900
---
> 2023-01-13,393.6199951171875,399.1000061035156,393.3399963378906,398.5,390.059814453125,63903900
% diff daily_auto_adjust_True_20240714_142435.csv daily_auto_adjust_True_20240714_142442.csv | head
2c2
< 2023-01-03,376.2290102149442,378.2453768730092,369.8275195343226,372.75421142578125,74850700
---
> 2023-01-03,376.2290410170059,378.2454078401519,369.827549812291,372.7542419433594,74850700
4,5c4,5
< 2023-01-05,373.635161717404,373.7526153349045,370.73786295793764,371.3447265625,76970500
< 2023-01-06,374.50629665205355,381.0056756304061,371.3740906518094,379.8604431152344,104189600
---
> 2023-01-05,373.63519242321297,373.7526460503659,370.73789342564294,371.3447570800781,76970500
> 2023-01-06,374.5063267394853,381.00570623999096,371.37412048760314,379.8604736328125,104189600
You should have seen a new warning message.
You should have seen a new warning message.
Yes, I see the warning message. But it does not help with the reproducibility. For example
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2))
gives the warning message. But neither
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=True)
nor
df = yf.download(["SPY"], start=date(2023, 1, 1), end=date(2024, 7, 2), auto_adjust=False)
give results that are reproducible.
It's not meant to fix the reproducibility. Adj Close
isn't consistent from Yahoo.
Describe bug
The OHLC data returned by yf.download() is not reproducible across multiple runs. For example, if I download OHLC data of SPY two times, the numbers are slightly different. Ideally, the numbers should be the same.
Simple code that reproduces your problem
Consider the following code. It downloads daily OHLC data for SPY and writes it to a file.
Run the code twice
Ideally, these two files should be the same. But they are not.
Debug log
Code with debug mode enabled
Output on the first run
Output from the second run
Differences
Bad data proof
No response
yfinance
version0.2.40
Python version
3.12.3
Operating system
Debian GNU/Linux 12 (bookworm)