ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
14.51k stars 2.42k forks source link

Weekly data is incomplete #521

Open seeohsee opened 3 years ago

seeohsee commented 3 years ago

I don't think I understand how to properly pull weekly data with yfinance. I seem to get different dates in return depending on the start and end date I select. In reality, I just want M-F data in a single pandas dataframe row. Sometimes the data I get back has dates listed as Mondays, sometimes Wednesdays, etc. How can I just get a Monday-Friday trading week in return?

Further, it seems the data can be incomplete. For example, if I want to compare weekly data between two tickers for the same time period, the end result should (in theory) contain the same number of weekly data points. In reality, I get two different results. Here is an example:

# Get the first dataframe
df1 = pdr.get_data_yahoo('GDX', '2014-12-29', '2020-11-29', interval='1wk')
df1 = df1.reset_index().drop_duplicates(subset='Date', keep='last').set_index('Date')
df1 = df1.dropna(how='all')

# Get the second dataframe
df2 = pdr.get_data_yahoo('QQQ', '2014-12-29', '2020-11-29', interval='1wk')
df2 = df2.reset_index().drop_duplicates(subset='Date', keep='last').set_index('Date')
df2 = df2.dropna(how='all')

# Compare the results. Since the start and end dates are the same between the two calls, they should have the same shape.
print(df1.shape) # This equals (306, 6)
print(df2.shape) # This equals (298, 6)

Clearly, there are missing entries in df2 that exist in df1. How can I get the two dataframes to contain the same number of entries, with the exact same dates and row indices?

I can examine the differences, like:

df2.index.difference(df1.index)
# Returns DatetimeIndex(['2015-12-21', '2016-12-19'], dtype='datetime64[ns]', name='Date', freq=None)

df1.index.difference(df2.index)
# Returns DatetimeIndex(['2017-09-18', '2018-03-19', '2018-06-18', '2018-09-24', '2018-12-24', '2019-03-18', '2019-06-24', '2019-09-23', '2020-03-23', '2020-06-22'], dtype='datetime64[ns]', name='Date', freq=None)
BradKML commented 2 years ago

Because depending on the index, some of them may be monthly reports, other weekly or "business-daily". They do not come from some standard vendor so they do not have to line up.

ValueRaider commented 2 years ago

This is actually caused by dividend events not being merged properly. This PR will fix it https://github.com/ranaroussi/yfinance/pull/1069

ValueRaider commented 1 year ago

Just for clarity ...

Sometimes the data I get back has dates listed as Mondays, sometimes Wednesdays, etc.

This is still an issue but caused by Yahoo. Solution is to shift start date back a few days to e.g. Saturday. Stupid but works. One day yfinance will do weekly properly (there are other problems in Yahoo)