ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
14.99k stars 2.45k forks source link

Why different interval can give different daily volume total? #758

Closed prRZ5F4LXZ closed 8 months ago

prRZ5F4LXZ commented 3 years ago

See below. Depending on the interval, the daily volume total can be different.

>>> import yfinance as yf
>>> ticker='CFLT'
>>> data = yf.Ticker(ticker).history(interval='1m', start='2021-07-06', end='2021-07-07')
print(data[['Volume']].sum())
data = yf.Ticker(ticker).history(interval='5m', start='2021-07-06', end='2021-07-07')
print(data[['Volume']].sum())
data = yf.Ticker(ticker).history(interval='15m', start='2021-07-06', end='2021-07-07')
print(data[['Volume']].sum())
>>> print(data[['Volume']].sum())
Volume    1273718
dtype: int64
>>> data = yf.Ticker(ticker).history(interval='5m', start='2021-07-06', end='2021-07-07')
>>> print(data[['Volume']].sum())
Volume    957058
dtype: int64
>>> data = yf.Ticker(ticker).history(interval='15m', start='2021-07-06', end='2021-07-07')
>>> print(data[['Volume']].sum())
Volume    957058
dtype: int64

But this error seems to be specific to certain tickers. For example, the daily total volumes of MSFT remain the same for different intervals. Does anybody know what causes the error? Is there an alternative data source that provides data without such silly errors?

>>> ticker='MSFT'
>>> data = yf.Ticker(ticker).history(interval='1m', start='2021-07-06', end='2021-07-07')
>>> print(data[['Volume']].sum())
Volume    24297684
dtype: int64
>>> data = yf.Ticker(ticker).history(interval='5m', start='2021-07-06', end='2021-07-07')
>>> print(data[['Volume']].sum())
Volume    24297684
dtype: int64
>>> data = yf.Ticker(ticker).history(interval='15m', start='2021-07-06', end='2021-07-07')
>>> print(data[['Volume']].sum())
Volume    24297684
dtype: int64
stefangab23 commented 3 years ago

I reproduced your error for the '1m' and '15m' intervals. There are missing entries on the '1m' version (I also attached an example of missing data below).

                            Open         High        Low       Close    Volume  Dividends  Stock Splits
Date
2021-07-06 15:12:00-04:00  42.980000  42.980000  42.980000  42.980000     104          0             0
2021-07-06 15:13:00-04:00  43.014999  43.014999  42.980000  43.000000    2153          0             0
2021-07-06 15:14:00-04:00  43.009998  43.020000  43.009998  43.020000    1320          0             0
2021-07-06 15:18:00-04:00  42.980000  43.025002  42.980000  43.025002     700          0             0
2021-07-06 15:19:00-04:00  43.070000  43.070000  43.070000  43.070000     200          0             0
2021-07-06 15:20:00-04:00  43.001701  43.080002  43.001701  43.080002    1210          0             0
2021-07-06 15:21:00-04:00  43.008202  43.169998  43.008202  43.169998     748          0             0
2021-07-06 15:23:00-04:00  43.139999  43.139999  43.009998  43.009998     905          0             0
2021-07-06 15:24:00-04:00  43.000000  43.000000  43.000000  43.000000     250          0             0

My suggestion would be to use the '1d' option for the interval as the volume seems to be the same as on finance.yahoo.com.

>>> data = yf.Ticker('CFLT').history(interval='1d', start='2021-07-06', end='2021-07-07')
>>> data
             Open       High        Low  Close  Volume  Dividends  Stock Splits
Date
2021-07-06  43.18  44.599998  42.639999   43.0  972900          0             0