Closed tcalbrecht closed 1 year ago
Read the doc ;)
# (optional, default is '1mo')
Well, both my period and interval settings matched the doc, so your comment was less than helpful.
# use "period" instead of start/end
# valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
# (optional, default is '1mo')
period = "ytd",
# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval = "1m",
The problem was with the interval option, not period.
According to the README, both "1d" and "1wk" are valid interval settings. For that example, "1d" works and "1wk" gives a runtime error.
Woups, my bad, I've been reading too fast, you're totaly right, sorry for that !
If you print what the function returns at each iteration (without filling the dataframe), you can see (on shorter period) that some index are duplicated (one with a value, another with a NaN), like this for the 1m/1wk SPY :
2020-03-16 227.46 2020-03-20 NaN 2020-03-20 229.40
It throws a Pandas error, since your df.index shouldn't contains dup values (see https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean).
My guess it's it comes from "non numeric" events (here, a dividend, as seen on https://finance.yahoo.com/quote/SPY/history?period1=1579651200&period2=1584835200&interval=1wk&filter=history&frequency=1d ).
The quick workaround is therefore adding a dropna()
to your yf call !
Ok, I get what's seems to be the issue here :
Following the JSON call ( https://query1.finance.yahoo.com/v8/finance/chart/SPY?range=3mo&interval=1wk&events=%22div,splits%22 ) shows that the events>dividends is under the last prices timestamp (but contains another timestamp as "date" value !), resulting, if you add actions=True
as a param to yf.download
, in two lines on the same timestamp, one filleds with NA's except for the dividend column.
I can work on a solution, but don't know what was the original behavior @ranaroussi : is the parse_actions in utils.py returning the dividend on the "date" timestamp ("true" dividend date), or is the dividend value added as an extra-column with the value in the existing price time stamp (ie just adding the dividend value at the end of an existing OHLC prices row) ?
Correct, dividends was causing this problem in weekly & monthly (bad merging). Fixed in latest versions.
Produced by this code:
Works fine where interval="1d"