ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
15k stars 2.45k forks source link

sqlite database is locked when using multithread download #1274

Closed gogog22510 closed 1 year ago

gogog22510 commented 1 year ago

data = yf.download( # or pdr.get_data_yahoo(...

tickers list or string as well, cannot be an numpy.ndarray

tickers=tickers,

# start date
start=start_date,

# end date
end=end_date,

# fetch data by interval (including intraday if period < 60 days)
# valid intervals: 1m,2m,5m,15m,30m,60m,90m,1h,1d,5d,1wk,1mo,3mo
# (optional, default is '1d')
interval="1d",

# group by ticker (to access via data['SPY'])
# (optional, default is 'column')
group_by='ticker',

# adjust all OHLC automatically
# (optional, default is False)
auto_adjust=False,

# download pre/post regular market hours data
# (optional, default is False)
prepost=True,

# use threads for mass downloading? (True/False/Integer)
# (optional, default is True)
threads=True,

# proxy URL scheme use use when downloading?
# (optional, default is None)
proxy=None

)


- The error message
    - `- NVDA: OperationalError('database is locked')`

When running the multithread download from my flask server in Docker image, I sometime saw this error message, the failed symbol can be different one. The failing symbol will still produce empty dataset.

Any suggestion on how to know there is an error when calling `yf.download(threads=True)` or is there a way to fix the sqlite database lock issue? (Maybe by increasing the timeout?)
ValueRaider commented 1 year ago

This shouldn't happen because database access is managed by a in-memory mutex, should be thread-safe. Are you running multiple Docker instances with yfinance running concurrently?

gogog22510 commented 1 year ago

I'm running one Docker instance with multiple threads, the command in the Docker image is from the Flask official guide, something like gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 'flaskr:create_app()'

gogog22510 commented 1 year ago

My theory is since I'm running multiple threads, the yfinance api potentially will be called in multiple requests, and l saw the mutex is part of the cache instance (https://github.com/ranaroussi/yfinance/blob/96ff2141079d8627c6b137ee802bea80e9f1229b/yfinance/utils.py#L701), and the cache instance is lazy initialized here (https://github.com/ranaroussi/yfinance/blob/96ff2141079d8627c6b137ee802bea80e9f1229b/yfinance/base.py#L780).

Therefore, if the first time yfinance api is called by multiple threads, potentially, the cache instance will initialize twice, and the database lock error may happen. (Currently, I only saw this error when I newly start my Docker instance and send request from my frontend)

ValueRaider commented 1 year ago

Good theory. Fix is simple, do you mind adding and testing yourself? In _TzCache::__init__(), append at bottom: self.tz_db .

gogog22510 commented 1 year ago

The fix makes sense, let me try it on my side.

sajanrav commented 1 year ago

I am still getting this error on newer versions ( for e.g. 0.2.18 ). This is what I tried :

In [2]: import yfinance as yf

In [3]: df = yf.download('MSFT', start='2023-01-01', end='2023-01-31')
[*********************100%***********************]  1 of 1 completed

1 Failed download:
- MSFT: IntegrityError('NOT NULL constraint failed: kv.key')

However, if I try the above using version 0.1.96, it works.

ValueRaider commented 1 year ago

Latest version on GitHub should fix this, try it and feedback. Instructions: #1080

sajanrav commented 1 year ago

Thanks, it works in the latest version. ( I tried it using the instructions )

ValueRaider commented 1 year ago

Great, will send out a release soon. Just for completeness, the fix is #1504.

ValueRaider commented 1 year ago

Actually, you should be able to revert to the normal PIP version now. Because that error only appears during a one-off migration process. Feed back.

sajanrav commented 1 year ago

The PIP version works now. Thanks @ValueRaider !