ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
13.25k stars 2.34k forks source link

cannot reindex from a duplicate axis #224

Closed tcalbrecht closed 1 year ago

tcalbrecht commented 4 years ago

Produced by this code:

    tl = ["IAU", "IJH", "IWB", "JNK", "SHY", "SPY", "TOTL", "VEA", "VWO", 'VFIAX']
    data = pd.DataFrame(columns=tl)

    for ticker in tl:
        data[ticker] = yf.download(tickers = ticker, period = "max", interval = "1wk")['Adj Close']

Works fine where interval="1d"

anonz322 commented 4 years ago

Read the doc ;)

valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max

    # (optional, default is '1mo')
tcalbrecht commented 4 years ago

Well, both my period and interval settings matched the doc, so your comment was less than helpful.

        # use "period" instead of start/end
        # valid periods: 1d,5d,1mo,3mo,6mo,1y,2y,5y,10y,ytd,max
        # (optional, default is '1mo')
        period = "ytd",

        # fetch data by interval (including intraday if period < 60 days)
        # valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
        # (optional, default is '1d')
        interval = "1m",

The problem was with the interval option, not period.

According to the README, both "1d" and "1wk" are valid interval settings. For that example, "1d" works and "1wk" gives a runtime error.

anonz322 commented 4 years ago

Woups, my bad, I've been reading too fast, you're totaly right, sorry for that !

If you print what the function returns at each iteration (without filling the dataframe), you can see (on shorter period) that some index are duplicated (one with a value, another with a NaN), like this for the 1m/1wk SPY :

2020-03-16 227.46 2020-03-20 NaN 2020-03-20 229.40

It throws a Pandas error, since your df.index shouldn't contains dup values (see https://stackoverflow.com/questions/27236275/what-does-valueerror-cannot-reindex-from-a-duplicate-axis-mean).

My guess it's it comes from "non numeric" events (here, a dividend, as seen on https://finance.yahoo.com/quote/SPY/history?period1=1579651200&period2=1584835200&interval=1wk&filter=history&frequency=1d ).

The quick workaround is therefore adding a dropna() to your yf call !

anonz322 commented 4 years ago

Ok, I get what's seems to be the issue here :

Following the JSON call ( https://query1.finance.yahoo.com/v8/finance/chart/SPY?range=3mo&interval=1wk&events=%22div,splits%22 ) shows that the events>dividends is under the last prices timestamp (but contains another timestamp as "date" value !), resulting, if you add actions=True as a param to yf.download, in two lines on the same timestamp, one filleds with NA's except for the dividend column.

I can work on a solution, but don't know what was the original behavior @ranaroussi : is the parse_actions in utils.py returning the dividend on the "date" timestamp ("true" dividend date), or is the dividend value added as an extra-column with the value in the existing price time stamp (ie just adding the dividend value at the end of an existing OHLC prices row) ?

ValueRaider commented 1 year ago

Correct, dividends was causing this problem in weekly & monthly (bad merging). Fixed in latest versions.