ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
14.89k stars 2.44k forks source link

yf.download() returning incorrect index #2101

Closed H-Ali13381 closed 8 hours ago

H-Ali13381 commented 3 weeks ago

Describe bug

yf.download() returning incorrect index

Simple code that reproduces your problem

import yfinance as yf data = yf.download('SPY') data.head()

image

Debug log

DEBUG Entering download() DEBUG Disabling multithreading because DEBUG logging enabled DEBUG Entering history() DEBUG Entering history() DEBUG SPY: Yahoo GET parameters: {'period1': '1925-11-20 12:49:42-05:00', 'period2': '2024-10-26 13:49:42-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'} DEBUG Entering get() DEBUG Entering _make_request() DEBUG url=https://query2.finance.yahoo.com/v8/finance/chart/SPY DEBUG params={'period1': -1392099018, 'period2': 1729964982, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'} DEBUG Entering _get_cookie_and_crumb() DEBUG cookie_mode = 'basic' DEBUG Entering _get_cookie_and_crumb_basic() DEBUG reusing cookie DEBUG reusing crumb DEBUG Exiting _get_cookie_and_crumb_basic() DEBUG Exiting _get_cookie_and_crumb() DEBUG response code=200 DEBUG Exiting _make_request() DEBUG Exiting get() DEBUG SPY: yfinance received OHLC data: 1993-01-29 14:30:00 -> 2024-10-25 13:30:00 DEBUG SPY: OHLC after cleaning: 1993-01-29 09:30:00-05:00 -> 2024-10-25 09:30:00-04:00 DEBUG SPY: OHLC after combining events: 1993-01-29 00:00:00-05:00 -> 2024-10-25 00:00:00-04:00 DEBUG SPY: yfinance returning OHLC: 1993-01-29 00:00:00-05:00 -> 2024-10-25 00:00:00-04:00 DEBUG Exiting history() DEBUG Exiting history() DEBUG Exiting download()

Bad data proof

No response

yfinance version

0.2.48

Python version

No response

Operating system

Windows

dhruvan2006 commented 3 weeks ago

Simple fix

Screenshot 2024-10-26 213010
H-Ali13381 commented 3 weeks ago

A quick fix I found is to rename the columns:

data.index.name = 'Date' data.columns = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume']

ValueRaider commented 3 weeks ago

I can't reproduce integer index - is that the problem?

>>> yf.download('SPY', session=session)
[*********************100%***********************]  1 of 1 completed
Price                       Adj Close       Close        High         Low        Open    Volume
Ticker                            SPY         SPY         SPY         SPY         SPY       SPY
Date                                                                                           
1993-01-29 00:00:00+00:00   24.608625   43.937500   43.968750   43.750000   43.968750   1003200
...
Faheem12005 commented 3 weeks ago

Integer index wasnt the problem for me, it was the ticker index. his PR solves the issue.

ValueRaider commented 2 weeks ago

If you are only fetching one symbol, why not use yf.Ticker('SPY').history()? @antoniouaa @cyrom0

cyrom0 commented 2 weeks ago

The output is different from 0.2.47 to 0.2.48, I have downgraded to 0.2.47 and the output looks good. This is the output of 0.2.47.

data = yf.download('SPY') [*100%***] 1 of 1 completed data.head() Adj Close Close High Low Open Volume Date
1993-01-29 00:00:00+00:00 24.608624 43.93750 43.96875 43.75000 43.96875 1003200 1993-02-01 00:00:00+00:00 24.783661 44.25000 44.25000 43.96875 43.96875 480500 1993-02-02 00:00:00+00:00 24.836142 44.34375 44.37500 44.12500 44.21875 201300 1993-02-03 00:00:00+00:00 25.098690 44.81250 44.84375 44.37500 44.40625 529400 1993-02-04 00:00:00+00:00 25.203718 45.00000 45.09375 44.46875 44.96875 531500

As you can see, the header/index from 0.2.48 is different with: Price Adj Close Close High Low Open Volume Ticker SPY SPY SPY SPY SPY SPY Date

cyrom0 commented 4 days ago

If you are only fetching one symbol, why not use yf.Ticker('SPY').history()? @antoniouaa @cyrom0

I wasn't aware that I can do the same using yf.Ticker('SPY').history(). I upgraded to 0.2.49 and sees that data = yf.Ticker('SPY').history(period="max", interval='1d', actions=False, auto_adjust=False) and data.index = pd.to_datetime(data.index, utc=True) will do the equivalent output as yf.download('SPY') for my purposes.

Tracing the code a bit and based on my limited understand, yf.download("SPY") has wrapper to handle multiple tickers so yf.Ticker('SPY').history() looks like is more efficient for one ticker case. I will modify my code to use yf.Ticker('SPY').history(). Basically, there are some additional processing in yf.download('SPY'), such pd.to_datetime(data.index, utc=True) compares to yf.Ticker('SPY').history(). The output of yf.Ticker('SPY').history() will not be able to treated the index as datetime without calling to pd.to_datetime(data.index, utc=True).

Thanks for your help.