ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
12.58k stars 2.25k forks source link

ParserError('Error tokenizing data. ...') #1526

Closed PeterSchober005 closed 1 year ago

PeterSchober005 commented 1 year ago

Hello,

I am running Python 3.11.2 and Yfinance 0.2.18.

I am trying to download historical data by executing the following line:

yf.download(tickers="AAPL", start='2021-1-1', auto_adjust=True, threads=True)

I am getting the following output (error):

"- AAPL: ParserError('Error tokenizing data. C error: Expected 2 fields in line 1098, saw 3\n')"

[*********************100%***********************] 1 of 1 completed 1 Failed download: - AAPL: ParserError('Error tokenizing data. C error: Expected 2 fields in line 1098, saw 3\n')
  | Open | High | Low | Close | Adj Close | Volume -- | -- | -- | -- | -- | -- | --

Anyone having the same issue? Many thanks!

PeterSchober005 commented 1 year ago

By the way, I am using Pandas 2.0.1. Operating system is Windows 10.

Many thanks again! :)

ValueRaider commented 1 year ago

Let's keep the issue thread concise and focused.

Install pre-release via PIP then enable debug logging and post output. And post any feedback you have on the logging.

PeterSchober005 commented 1 year ago

Ok, ValueRaider. I have installed the pre-release version and enabled debug logging. Pandas is 2.0.1, Python 3.11.2, Windows 10.

Input: yf.download(tickers="AAPL", start='2021-1-1', auto_adjust=True, threads=True)


Output:

ERROR 1 Failed download: ERROR ['AAPL']: ParserError('Error tokenizing data. C error: Expected 2 fields in line 1098, saw 3\n') DEBUG ['AAPL']: Traceback (most recent call last): File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\multi.py", line 243, in _download_one_threaded data = _download_one(ticker, start, end, auto_adjust, back_adjust, repair, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\multi.py", line 262, in _download_one return Ticker(ticker).history( ^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\base.py", line 158, in history tz = self._get_ticker_tz(proxy, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\base.py", line 1013, in _get_ticker_tz cache = utils.get_tz_cache() ^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 978, in get_tz_cache _tz_cache = _TzCache() ^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 902, in init self._migrate_cache_tkr_tz() File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 943, in _migrate_cache_tkr_tz df = _pd.read_csv(old_cache_file_path, index_col="Ticker") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\readers.py", line 912, in read_csv return _read(filepath_or_buffer, kwds) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\readers.py", line 583, in _read return parser.read(nrows) ^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\readers.py", line 1704, in read ) = self._engine.read( # type: ignore[attr-defined] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 234, in read chunks = self._reader.read_low_memory(nrows) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "pandas_libs\parsers.pyx", line 812, in pandas._libs.parsers.TextReader.read_low_memory File "pandas_libs\parsers.pyx", line 873, in pandas._libs.parsers.TextReader._read_rows File "pandas_libs\parsers.pyx", line 848, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas_libs\parsers.pyx", line 859, in pandas._libs.parsers.TextReader._check_tokenize_status File "pandas_libs\parsers.pyx", line 2025, in pandas._libs.parsers.raise_parser_error pandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 1098, saw 3

Open    High    Low Close   Adj Close   Volume

Date


A big thank you again, ValueRaider!

ValueRaider commented 1 year ago

Grab Git branch fix/tz-cache-migrate-error - instructions #1080. Only need to run once, then normal PIP versions should work. So not considering this urgent to send out.

What do you think of the new logging / error reporting?

ValueRaider commented 1 year ago

Or, easier option: temporarily downgrade to 0.1.96

PeterSchober005 commented 1 year ago

ValueRaider, many thanks for the solutions! I implemented the easier option (downgrading to 0.1.96), since option 1 was somewhat difficult for me to implement. I do not have Git and installing it seemed somewhat tricky. Anyway, it is running fine now. A very huge thank you, @ValueRaider !! :)

PeterSchober005 commented 1 year ago

@ValueRaider many thanks again for your help on this issue! just one final question please before I close this issue. Will the fix be implemented in the coming release or is it already implemented in the pre-release? I am new to Github so I am not very familiar with the terminology here... :) Many thanks again!

ValueRaider commented 1 year ago

It's only in dev branch currently so leave issue open, no urgency to send it out.

The pre-release (aka beta) contains the new logging functionality, that's completely different.

PeterSchober005 commented 1 year ago

Hello ValueRaider,

I noticed that I need some functions that are not available in 0.1.96. So I installed git and executed the following line:

pip install git+https://github.com/ranaroussi/yfinance.git@fix/tz-cache-migrate-error

Output was:

Collecting git+https://github.com/ranaroussi/yfinance.git@fix/tz-cache-migrate-error Cloning https://github.com/ranaroussi/yfinance.git (to revision fix/tz-cache-migrate-error) to c:\users\peter\appdata\local\temp\pip-req-build-p3bjlifc Running command git clone --filter=blob:none --quiet https://github.com/ranaroussi/yfinance.git 'C:\Users\peter\AppData\Local\Temp\pip-req-build-p3bjlifc' WARNING: Did not find branch or tag 'fix/tz-cache-migrate-error', assuming revision or ref. Running command git checkout -q fix/tz-cache-migrate-error error: pathspec 'fix/tz-cache-migrate-error' did not match any file(s) known to git error: subprocess-exited-with-error

× git checkout -q fix/tz-cache-migrate-error did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× git checkout -q fix/tz-cache-migrate-error did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.


Obvisouly he could not find the fix/tz-cache-migrate-error. I guess this is because the fix was deleted as written here: https://github.com/ranaroussi/yfinance/pull/1528

So should I stay on 0.1.96? There is no chance for me to use download function of the current release version of yfinance, right?

Many thanks in advance again, @ValueRaider ! :)

ValueRaider commented 1 year ago

I thought you confirmed the issue fixed by downgrading to 0.1.96. Only need to run the fix once: https://github.com/ranaroussi/yfinance/issues/1526#issuecomment-1551465548

PeterSchober005 commented 1 year ago

Yes, the download function is working now with 0.1.96. But there are other functions that 0.1.96 does not include. However, these functions are implemented in the current version. The function I am referring to is whether a ticker has options. This function is not provided in 0.1.96. I tried to implement @fix/tz-cache-migrate-error via git but it failed... I think it was removed (https://github.com/ranaroussi/yfinance/pull/1528)

ValueRaider commented 1 year ago

Only need to run once, then normal PIP versions should work

What do you think that means?

PeterSchober005 commented 1 year ago

Sorry but I can't follow you... 😅

ValueRaider commented 1 year ago

You only need to run the 'bug fix' once and the bug is permanently fixed. Trust me.

PeterSchober005 commented 1 year ago

@ValueRaider thanks for the info! So after I ran the bug fix and re-installed the current version of yfinance, I get the following output:

Input: stockdata = yf.download("AAPL",'2021-1-1', group_by="ticker",auto_adjust=True, threads=True)

Output: [*100%***] 1 of 1 completed

1 Failed download:

Another try: Input: stockdata = yf.download("AAPL",'2021-1-1', auto_adjust=True, threads=True)

Output: [*100%***] 1 of 1 completed

1 Failed download:

I have no idea what the issue is here... 😅

But many thanks again, @ValueRaider I need patience! ;-)

ValueRaider commented 1 year ago

I need the stack trace. ~Switch from download to yf.Ticker.history(), that will print it.~ Stick with download and instead enable debug logging.

PeterSchober005 commented 1 year ago

Hi @ValueRaider

I enabled debug logging and I am still getting exactly the same output as before:

Input: stockdata = yf.download("AAPL",'2021-1-1', group_by="ticker",auto_adjust=True, threads=True)

Output: [*100%***] 1 of 1 completed

1 Failed download:

Input: stockdata = yf.download("AAPL",'2021-1-1', group_by="ticker",auto_adjust=True, threads=True)

Output: [*100%***] 1 of 1 completed

1 Failed download:

Many thanks again, @ValueRaider ! :)

ValueRaider commented 1 year ago

Doesn't look like pre-release. Confirm your yfinance version: print(yf.__version__)

PeterSchober005 commented 1 year ago

Good morning @ValueRaider!

The yfinance version that I am running at the moment is print(yf.version): 0.2.19b3

I tried to execute again the download function and got the following error messages:


First run:

Here is the input: stockdata = yf.download("AAPL",'2021-1-1', group_by="ticker",auto_adjust=True, threads=True)

The output: [*100%***] 1 of 1 completed ERROR 1 Failed download: ERROR ['AAPL']: IntegrityError('NOT NULL constraint failed: kv.key') DEBUG ['AAPL']: Traceback (most recent call last): File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\multi.py", line 243, in _download_one_threaded data = _download_one(ticker, start, end, auto_adjust, back_adjust, repair, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\multi.py", line 262, in _download_one return Ticker(ticker).history( ^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\base.py", line 158, in history tz = self._get_ticker_tz(proxy, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\base.py", line 1013, in _get_ticker_tz cache = utils.get_tz_cache() ^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 978, in get_tz_cache _tz_cache = _TzCache() ^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 902, in init self._migrate_cache_tkr_tz() File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 949, in _migrate_cache_tkr_tz self.tz_db.bulk_set(df.to_dict()['Tz']) File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 882, in bulk_set self.conn.executemany('replace into "kv" (key, value) values (?,?)', records) sqlite3.IntegrityError: NOT NULL constraint failed: kv.key


Second run: Input: stockdata = yf.download("AAPL",'2021-1-1', group_by="ticker",auto_adjust=True, threads=True)

Output: [*100%***] 1 of 1 completed ERROR 1 Failed download: ERROR ['AAPL']: OperationalError('database is locked') DEBUG ['AAPL']: Traceback (most recent call last): File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\multi.py", line 243, in _download_one_threaded data = _download_one(ticker, start, end, auto_adjust, back_adjust, repair, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\multi.py", line 262, in _download_one return Ticker(ticker).history( ^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\base.py", line 158, in history tz = self._get_ticker_tz(proxy, timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\base.py", line 1013, in _get_ticker_tz cache = utils.get_tz_cache() ^^^^^^^^^^^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 978, in get_tz_cache _tz_cache = _TzCache() ^^^^^^^^^^ File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 902, in init self._migrate_cache_tkr_tz() File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 949, in _migrate_cache_tkr_tz self.tz_db.bulk_set(df.to_dict()['Tz']) File "c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\utils.py", line 882, in bulk_set self.conn.executemany('replace into "kv" (key, value) values (?,?)', records) sqlite3.OperationalError: database is locked


Quick note: I think the problem is that I ran "pip install git+https://github.com/ranaroussi/yfinance.git@fix/tz-cache-migrate-error" AFTER it was removed... Hence, I got the error message that is shown in https://github.com/ranaroussi/yfinance/issues/1526#issuecomment-1556170785

Is it possible to restore the fix and to keep it until it is built in the coming release? This would allow me to install and use all upcoming versions of yfinance...

Many thanks again, @ValueRaider ! Appreciate your support and thanks for your patience with me! :)

ValueRaider commented 1 year ago

Is it possible to restore the fix and to keep it until it is built in the coming release?

If you look more carefully, this current error is different to the first. Can you email me the contents of the folder printed by yf.utils._cache_dir+"/py-yfinance/", then I can give you a quick-fix for this while I investigate. My email is on my profile.

ValueRaider commented 1 year ago

... are you serious? I don't have the patience for this.

ValueRaider commented 1 year ago

Ok you eventually figured it out. Fix by deleting those files.

PeterSchober005 commented 1 year ago

A very big thank you, @ValueRaider !

I have some questions though:

  1. How come the fix "@fix/tz-cache-migrate-error" worked although I have always received the message that the file could not be found?
  2. Can I install other versions (like the last release version) without having to execute the fix "@fix/tz-cache-migrate-error"?
  3. Why were the files in yf.utils._cache_dir+"/py-yfinance/" at all if I had to delete them and why am I first one to have this problem? Should I uninstall python and all packages and install them again? I guess something is wrongly configured on my side...

Anyway, many thanks for your support and patience again! :)

PeterSchober005 commented 1 year ago

@ValueRaider I just got another bug when executing the following line:


Input: start = time.time() stockdata = yf.download(ticker_list_r3000,'2021-1-1', group_by="ticker",auto_adjust=True, threads=True) stockdata.index=pd.DatetimeIndex(stockdata.index) print(time.time()-start)

Output: [ 1% ] 39 of 2634 completed ERROR SAFE WI: No timezone found, symbol may be delisted [** 12% ] 329 of 2634 completed ERROR ADRO: No timezone found, symbol may be delisted [* 19% ] 495 of 2634 completed ERROR MSFUT: No timezone found, symbol may be delisted [*** 22% ] 583 of 2634 completed ERROR UHALB: No timezone found, symbol may be delisted [**** 24% ] 627 of 2634 completed ERROR LENB: No timezone found, symbol may be delisted [**** 26% ] 678 of 2634 completed ERROR XTSLA: No timezone found, symbol may be delisted [** 30% ] 780 of 2634 completed ERROR ARD: No timezone found, symbol may be delisted [* 35% ] 914 of 2634 completed ERROR ESM3: No timezone found, symbol may be delisted [*** 40% ] 1044 of 2634 completed ERROR PFHC: No timezone found, symbol may be delisted [* 40% ] 1059 of 2634 completed ERROR RTYM3: No timezone found, symbol may be delisted [**** 41% ] 1074 of 2634 completed ERROR HEIA: No timezone found, symbol may be delisted [*** 44% ] 1167 of 2634 completed ERROR BFA: No timezone found, symbol may be delisted [**49% ] 1278 of 2634 completed ERROR CWENA: No timezone found, symbol may be delisted [**49% ] 1295 of 2634 completed ERROR LGFA: No timezone found, symbol may be delisted [**52% ] 1370 of 2634 completed ERROR GTXI: No timezone found, symbol may be delisted [**53% ] 1393 of 2634 completed ERROR GEFB: No timezone found, symbol may be delisted [**53% ] 1394 of 2634 completed ERROR BFB: No timezone found, symbol may be delisted [**56% ] 1464 of 2634 completed ERROR CR WI: No timezone found, symbol may be delisted [****58%* ] 1540 of 2634 completed ERROR RXO WI: No timezone found, symbol may be delisted [**62%*** ] 1632 of 2634 completed ERROR LGFB: No timezone found, symbol may be delisted [**70%* ] 1841 of 2634 completed ERROR BRKB: No timezone found, symbol may be delisted [**76%*** ] 1990 of 2634 completed ERROR P5N994: No timezone found, symbol may be delisted [**78%**** ] 2060 of 2634 completed ERROR MOGA: No timezone found, symbol may be delisted [**82%** ] 2149 of 2634 completed ERROR BHM WI: No timezone found, symbol may be delisted [**95% ] 2497 of 2634 completed ERROR TFM: No price data found, symbol may be delisted (1d 2021-1-1 -> 2023-05-23) [100%***] 2634 of 2634 completed ERROR 25 Failed downloads: ERROR ['SAFE WI', 'ADRO', 'MSFUT', 'UHALB', 'LENB', 'XTSLA', 'ARD', 'ESM3', 'PFHC', 'RTYM3', 'HEIA', 'BFA', 'CWENA', 'LGFA', 'GTXI', 'GEFB', 'BFB', 'CR WI', 'RXO WI', 'LGFB', 'BRKB', 'P5N994', 'MOGA', 'BHM WI']: No timezone found, symbol may be delisted ERROR ['TFM']: No price data found, symbol may be delisted (1d 2021-1-1 -> 2023-05-23)


KeyError Traceback (most recent call last) Cell In[6], line 2 1 start = time.time() ----> 2 stockdata = yf.download(ticker_list_r3000,'2021-1-1', group_by="ticker",auto_adjust=True, threads=True) 3 stockdata.index=pd.DatetimeIndex(stockdata.index) 4 print(time.time()-start)

File c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\multi.py:178, in download(tickers, start, end, actions, threads, ignore_tz, group_by, auto_adjust, back_adjust, repair, keepna, progress, period, show_errors, interval, prepost, proxy, rounding, timeout) 176 tbs = {} 177 for ticker in shared._ERRORS: --> 178 tb = shared._TRACEBACKS[ticker] 179 if not tb in tbs: 180 tbs[tb] = [ticker]

KeyError: 'SAFE WI'

PeterSchober005 commented 1 year ago

The problem is the following part: KeyError Traceback (most recent call last) Cell In[6], line 2 1 start = time.time() ----> 2 stockdata = yf.download(ticker_list_r3000,'2021-1-1', group_by="ticker",auto_adjust=True, threads=True) 3 stockdata.index=pd.DatetimeIndex(stockdata.index) 4 print(time.time()-start)

File c:\Users\peter\AppData\Local\Programs\Python\Python311\Lib\site-packages\yfinance\multi.py:178, in download(tickers, start, end, actions, threads, ignore_tz, group_by, auto_adjust, back_adjust, repair, keepna, progress, period, show_errors, interval, prepost, proxy, rounding, timeout) 176 tbs = {} 177 for ticker in shared._ERRORS: --> 178 tb = shared._TRACEBACKS[ticker] 179 if not tb in tbs: 180 tbs[tb] = [ticker]

KeyError: 'SAFE WI'


The first part is ok... these "error messages" that some symbols might be problably have been delisted are ok... they were always there.

ValueRaider commented 1 year ago

Questions: read https://github.com/ranaroussi/yfinance#timezone-cache-store. yfinance was trying to migrate from the old CSV-cache to newer SQL-cache. No CSV-cache = no migration.

'SAFE WI': problem should be obvious. If you think yfinance can handle it better then create a specific Issue.

PeterSchober005 commented 1 year ago

@ValueRaider

I re-installed the latest release-version of yfinance and it is running like silk. It is not leading to the bug like the pre-release version!

Input: start = time.time() stockdata = yf.download(ticker_list_r3000,'2021-1-1', group_by="ticker",auto_adjust=True, threads=True) stockdata.index=pd.DatetimeIndex(stockdata.index) print(time.time()-start)

Output: [*100%***] 2634 of 2634 completed

25 Failed downloads:


There is no more bug like in https://github.com/ranaroussi/yfinance/issues/1526#issuecomment-1559588523!

There has also not been a bug in previous versions! The output simply said that there were a couple of failed downloads but nevertheless, the download was completed and the dataframe was constructed. The pre-release version generates a bug once there is a failed download.

Hence, can you please allow future versions of yfinance to continue the download without generating a bug if it fails to download certain symbols?

Once again, a very big thank you, @ValueRaider !

ValueRaider commented 1 year ago

Hence, can you please allow future versions of yfinance to continue the download without generating a bug if it fails to download certain symbols?

Me? No. Create a specific Issue and maybe someone can implement.

PeterSchober005 commented 1 year ago

Just did: https://github.com/ranaroussi/yfinance/issues/1537

I hope the issue is clear... Whenever a long list of tickers is passed to yf.download, chances are that some tickers will fail to be downloaded. In this case, yf.download shoould not generate a bug like in the pre-release version 0.2.19b3 but continue its behaviour like in 0.2.18.