ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
14.93k stars 2.45k forks source link

Data Period Varies With History and Download Functions #1987

Closed alexnwk closed 4 months ago

alexnwk commented 4 months ago

Describe bug

When using .history or .download on specific tickers, the resulting OHLC data will be of inconsistent size with the incorrect current date. The issue is inconsistent and will flip from correct to incorrect within seconds. For example, today is 7/17/2024. Running the below loop code with a start date of 1/2/2023 will result in a dataframe from 1/2/2023 -> 7/17/2024 OR 1/2/2023 -> 7/16/2024. I have noticed this issue with only specific tickers including ^DJI and ^GSPC, Dow Jones Index and S&P500 Index, respectively. I have not seen this issue on company-specific tickers (e.g. AAPL, NVDA), or other index-like tickers (e.g. ^TNX (10-year treasury yield), ^VIX (volatility)). However, because of the problem's infrequent occurrence inconsistent results could have missed with other tickers.

To mitigate this issue to ensure correct downloads through to the current date, I wrote a separate application which simply queries the data frame's end-date, and if it does not match the current date will re-download via YFinance after waiting 20 seconds. It is a bandaid for the underlying issue.

I am uncertain if this issue is similar to 1982. That issue appears more with data inconsistency across the data frame, whereas this is how YF is calculating (or pulling) the end-date.

from datetime import datetime 
import time
import yfinance as yf
import pytz

newYorkTz = pytz.timezone("America/New_York") 

def check_for_yf_date_mismatch(df):
    # For unknown reasons, wrong dates can be pulled from Yfinance. 
    # It appears to go away after a few seconds, so this programs ensures the correct
    # date is pulled, otherwise it will throw an error. 
    last_period = df.iloc[[-1]].index.strftime('%Y-%m-%d')[0]
    now = datetime.now(newYorkTz).strftime('%Y-%m-%d')
    if last_period != now:
        return True
    else:
        return False

def yf_download_period(ticker, interval, data_period):
    data = yf.Ticker(ticker).history(interval=interval, period=data_period)
    for x in range(10):
        if check_for_yf_date_mismatch(data) == True:
            print('The incorrect date was found in YFinance data pull for: '+ticker+'. Attempting to re-download and correct.')
            time.sleep(20)
            data = yf.Ticker(ticker).history(interval=interval, period=data_period)
        else:
            break
    if check_for_yf_date_mismatch(data) == True: raise Exception('YF failed to pull the correct latest date. Multiple attempts were made to correct the dataset, with no success.')
    return data

Simple code that reproduces your problem

import yfinance as yf
import time

yf.enable_debug_mode()

for x in range (10):
    data = yf.Ticker('^DJI').history(interval='1d', start='2023-01-02')
    print(data.tail(1))
    time.sleep(5)

for x in range (10):
    data = yf.download('^DJI', interval='1d', start='2023-01-02')
    print(data.tail(1))
    time.sleep(5)

Debug log

DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       ^DJI: Yahoo GET parameters: {'period1': '2023-01-02 00:00:00-05:00', 'period2': '2024-07-17 12:26:49-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/^DJI
DEBUG        params={'period1': 1672635600, 'period2': 1721233609, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          loaded persistent cookie
DEBUG          reusing cookie
DEBUG          crumb = 'ELQMuCjlz1x'
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       ^DJI: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-16 13:30:00
DEBUG       ^DJI: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-16 09:30:00-04:00
DEBUG       ^DJI: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-16 00:00:00-04:00
DEBUG       ^DJI: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-16 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
                   Open          High          Low         Close     Adj Close     Volume
Date                                                                                     
2024-07-16  40263.78125  40988.808594  40263.78125  40954.480469  40954.480469  306390000
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       ^DJI: Yahoo GET parameters: {'period1': '2023-01-02 00:00:00-05:00', 'period2': '2024-07-17 12:26:54-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/^DJI
DEBUG        params={'period1': 1672635600, 'period2': 1721233614, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          reusing cookie
DEBUG          reusing crumb
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       ^DJI: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-16 13:30:00
DEBUG       ^DJI: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-16 09:30:00-04:00
DEBUG       ^DJI: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-16 00:00:00-04:00
DEBUG       ^DJI: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-16 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
                   Open          High          Low         Close     Adj Close     Volume
Date                                                                                     
2024-07-16  40263.78125  40988.808594  40263.78125  40954.480469  40954.480469  306390000
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       ^DJI: Yahoo GET parameters: {'period1': '2023-01-02 00:00:00-05:00', 'period2': '2024-07-17 12:27:00-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/^DJI
DEBUG        params={'period1': 1672635600, 'period2': 1721233620, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          reusing cookie
DEBUG          reusing crumb
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       ^DJI: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-17 16:26:59
DEBUG       ^DJI: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-17 12:26:59-04:00
DEBUG       ^DJI: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-17 00:00:00-04:00
DEBUG       ^DJI: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-17 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
                    Open          High           Low        Close    Adj Close     Volume
Date                                                                                     
2024-07-17  40862.601562  41190.089844  40849.710938  41178.46875  41178.46875  200886637
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       ^DJI: Yahoo GET parameters: {'period1': '2023-01-02 00:00:00-05:00', 'period2': '2024-07-17 12:27:05-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/^DJI
DEBUG        params={'period1': 1672635600, 'period2': 1721233625, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          reusing cookie
DEBUG          reusing crumb
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       ^DJI: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-17 16:27:04
DEBUG       ^DJI: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-17 12:27:04-04:00
DEBUG       ^DJI: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-17 00:00:00-04:00
DEBUG       ^DJI: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-17 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
                    Open          High           Low         Close     Adj Close     Volume
Date                                                                                       
2024-07-17  40862.601562  41190.089844  40849.710938  41174.230469  41174.230469  200936832
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       ^DJI: Yahoo GET parameters: {'period1': '2023-01-02 00:00:00-05:00', 'period2': '2024-07-17 12:27:10-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/^DJI
DEBUG        params={'period1': 1672635600, 'period2': 1721233630, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          reusing cookie
DEBUG          reusing crumb
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       ^DJI: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-17 16:27:09
DEBUG       ^DJI: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-17 12:27:09-04:00
DEBUG       ^DJI: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-17 00:00:00-04:00
DEBUG       ^DJI: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-17 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
                    Open          High           Low    Close  Adj Close     Volume
Date                                                                               
2024-07-17  40862.601562  41190.089844  40849.710938  41172.0    41172.0  200971412
DEBUG    Entering download()
DEBUG     Disabling multithreading because DEBUG logging enabled
DEBUG     Entering history()
DEBUG      Entering history()
DEBUG       ^DJI: Yahoo GET parameters: {'period1': '2023-01-02 00:00:00-05:00', 'period2': '2024-07-17 12:27:15-04:00', 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG       Entering get()
DEBUG        url=https://query2.finance.yahoo.com/v8/finance/chart/^DJI
DEBUG        params={'period1': 1672635600, 'period2': 1721233635, 'interval': '1d', 'includePrePost': False, 'events': 'div,splits,capitalGains'}
DEBUG        Entering _get_cookie_and_crumb()
DEBUG         cookie_mode = 'basic'
DEBUG         Entering _get_cookie_and_crumb_basic()
DEBUG          reusing cookie
DEBUG          reusing crumb
DEBUG         Exiting _get_cookie_and_crumb_basic()
DEBUG        Exiting _get_cookie_and_crumb()
DEBUG        response code=200
DEBUG       Exiting get()
DEBUG       ^DJI: yfinance received OHLC data: 2023-01-03 14:30:00 -> 2024-07-16 13:30:00
DEBUG       ^DJI: OHLC after cleaning: 2023-01-03 09:30:00-05:00 -> 2024-07-16 09:30:00-04:00
DEBUG       ^DJI: OHLC after combining events: 2023-01-03 00:00:00-05:00 -> 2024-07-16 00:00:00-04:00
DEBUG       ^DJI: yfinance returning OHLC: 2023-01-03 00:00:00-05:00 -> 2024-07-16 00:00:00-04:00
DEBUG      Exiting history()
DEBUG     Exiting history()
DEBUG    Exiting download()
                   Open          High          Low         Close     Adj Close     Volume
Date                                                                                     
2024-07-16  40263.78125  40988.808594  40263.78125  40954.480469  40954.480469  306390000

Bad data proof

Open          High           Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                                 
2024-07-17 00:00:00-04:00  40862.601562  41190.089844  40849.710938  41165.628906  197533095        0.0           0.0
                                  Open          High          Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                               
2024-07-16 00:00:00-04:00  40263.78125  40988.808594  40263.78125  40954.480469  306390000        0.0           0.0
                                   Open          High           Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                                 
2024-07-17 00:00:00-04:00  40862.601562  41190.089844  40849.710938  41169.261719  197622476        0.0           0.0
                                  Open          High          Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                               
2024-07-16 00:00:00-04:00  40263.78125  40988.808594  40263.78125  40954.480469  306390000        0.0           0.0
                                   Open          High           Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                                 
2024-07-17 00:00:00-04:00  40862.601562  41190.089844  40849.710938  41170.808594  197722027        0.0           0.0
                                   Open          High           Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                                 
2024-07-17 00:00:00-04:00  40862.601562  41190.089844  40849.710938  41167.789062  197758748        0.0           0.0
                                   Open          High           Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                                 
2024-07-17 00:00:00-04:00  40862.601562  41190.089844  40849.710938  41161.671875  197817592        0.0           0.0
                                  Open          High          Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                               
2024-07-16 00:00:00-04:00  40263.78125  40988.808594  40263.78125  40954.480469  306390000        0.0           0.0
                                   Open          High           Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                                 
2024-07-17 00:00:00-04:00  40862.601562  41190.089844  40849.710938  41164.621094  197897898        0.0           0.0
                                  Open          High          Low         Close     Volume  Dividends  Stock Splits
Date                                                                                                               
2024-07-16 00:00:00-04:00  40263.78125  40988.808594  40263.78125  40954.480469  306390000        0.0           0.0

yfinance version

0.2.40

Python version

3.9

Operating system

No response

ValueRaider commented 4 months ago

Try setting end to tomorrow.

alexnwk commented 4 months ago

This successfully fixed the issue. I would argue the underlying bug still exists as it is odd the program will flip back and forth between the end date, however the solution is simple enough where a further dive into the issue is not warranted. Here is example code I used to overcome the issue when repeatedly pulling data through today's current date.


newYorkTz = pytz.timezone("America/New_York") 
tomorrow = (datetime.now(newYorkTz) + timedelta(days=1)).strftime('%Y-%m-%d')
data = yf.Ticker('^DJI').history(interval='1d', start='2023-01-02', end=tomorrow)```
ValueRaider commented 4 months ago

You could change default to tomorrow #1084