Closed jainraje closed 7 years ago
Update: I created test code to download historical data (1yr) for appx 4000 symbols.
Here is a snippet of the error (failing at the 988th symbol):
983 of 3802 : DSPG
984 of 3802 : DST
985 of 3802 : DSU
986 of 3802 : DSW
987 of 3802 : DTE
988 of 3802 : DUG
Traceback (most recent call last):
File "test_yahoo_finnce_fix.py", line 23, in <module>
df = pdr.get_data_yahoo(row.Symbol, start=start_date)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/fix_yahoo_finance/__init__.py", line 101, in get_data_yahoo
dfs[ticker] = pd.read_csv(hist, index_col=0
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py", line 411, in _read
data = parser.read(nrows)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py", line 982, in read
ret = self._engine.read(nrows)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py", line 1719, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)
File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)
File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)
File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)
File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2
I'm happy to run some additional tests if you can advise.
TIA --R
This seems to be an issue related a mis-formatted csv returned by Yahoo (probably in case of authentication/usage issues). I've added a wait and re-authentication scheme in hopes that it would solve the issue.
Upgrade using:
$ pip install fix_yahoo_finance --upgrade --no-cache-dir
If it doesn't, try calling fix_yahoo_finance.get_yahoo_crumb(force=True)
every 500 calls or so and see if this helps.
Hi Ran,
Thanks for your update. I will run a few tests today and communicate my results back to you.
Best, Rajeev
=== Rajeev Jain Jainraje@yahoo.com
On May 30, 2017, at 1:48 AM, Ran Aroussi notifications@github.com wrote:
This seems to be an issue related a mis-formatted csv returned by Yahoo (probably in case of authentication/usage issues). I've added a wait and re-authentication scheme in hopes that it would solve the issue.
Upgrade using:
$ pip install fix_yahoo_finance --upgrade --no-cache-dir If it doesn't, try calling fix_yahoo_finance.get_yahoo_crumb(force=True) every 500 calls or so and see if this helps.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.
Out of 3800 tickers, 7 failed. Here are a few snippets of the failed cases:
1017 of 3802 : EBS Requesting EBS EOD data from yahoo start: 1990-01-01 end: 2017-05-30 https://query1.finance.yahoo.com/v7/finance/download/EBS?period1=631180800&period2=1496127600&interval=1d&events=history&crumb=hAuEssz\u002FWbk Unknown string format failed to pull pricing data
1680 of 3802 : HUBB Requesting HUBB EOD data from yahoo start: 1990-01-01 end: 2017-05-30 https://query1.finance.yahoo.com/v7/finance/download/HUBB?period1=631180800&period2=1496127600&interval=1d&events=history&crumb=KmkwymyN72p 'Volume' failed to pull pricing data
1721 of 3802 : IDE Requesting IDE EOD data from yahoo start: 1990-01-01 end: 2017-05-30 https://query1.finance.yahoo.com/v7/finance/download/IDE?period1=631180800&period2=1496127600&interval=1d&events=history&crumb=qRHQYjmNhE2 Unknown string format failed to pull pricing data
2285 of 3802 : MTW Requesting MTW EOD data from yahoo start: 1990-01-01 end: 2017-05-30 https://query1.finance.yahoo.com/v7/finance/download/MTW?period1=631180800&period2=1496127600&interval=1d&events=history&crumb=rT8SjAj4cJp Unknown string format failed to pull pricing data
I ran it again for just the failed tickers and data was retrieved successfully. Perhaps the bad responses are related to server overload?
These are certainly better results. I've released a new version. Let's see if we can do better :)
Upgrade using:
$ pip install fix_yahoo_finance --upgrade --no-cache-dir
I've been running your newer version for a few days. I see significant improvement. Very nice.
To be complete with my feedback, I still a small number of symbols not downloading and require a second pass.
Loop1 - 7273 tickers download 5 days of OHLCV for each ticker if error add to error_list
Loop2 - run through error_list (33 tickers) download 5 days of OHLCV for each ticker if error add to error_list
after 2nd loop:
Question: Any thoughts on why the 16 symbols could not be downloaded during the first pass and required a 2nd pass? Why the remaining 17 are not downloaded makes senses as they are not recognized by yahoo finance.
At any rate, your current fix is a welcome solution and is very workable. thank-you!
I've added thread support in 0.0.9. Can you please check if this is working better for you?
Upgrade using:
$ pip install fix_yahoo_finance --upgrade --no-cache-dir
and add threads=INT
to your code (I use threads=len(tickers)//10)
myself)
I just updated to latest:
fix-yahoo-finance 0.0.13
I'm getting many 'Index' errors. See log below. Same code on my end which worked quite well with your 0.0.8. Am I doing something wrong? When I revert back to 0.0.8 no Index errors.
df = pdr.get_data_yahoo(str(ticker), start=start, end=end)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/fix_yahoo_finance/__init__.py", line 190, in download
data = pd.Panel(_DFS_)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 148, in __init__
minor_axis=minor_axis, copy=copy, dtype=dtype)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 173, in _init_data
mgr = self._init_dict(data, passed_axes, dtype=dtype)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 214, in _init_dict
for i, a in enumerate(axes)]
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 214, in <listcomp>
for i, a in enumerate(axes)]
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 1466, in _extract_axis
index = _get_combined_index(indexes, intersect=intersect)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/indexes/api.py", line 43, in _get_combined_index
union = _union_indexes(indexes)
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/indexes/api.py", line 74, in _union_indexes
return result.union_many(indexes[1:])
File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/indexes/datetimes.py", line 1050, in union_many
if this.freq is None:
AttributeError: 'Index' object has no attribute 'freq'
Can you please check on v. 0.0.14?
Upgrade using:
$ pip install fix_yahoo_finance --upgrade --no-cache-dir
I upgraded to latest (i.e. 0.0.15) but error occurs. I revert back to 0.0.8 and price download works. Do I need to change my calling convention?
Here is the code: (actual ticker_list is much bigger) ticker_list = ['A', 'AA', 'AAAP', 'AAC', 'AADR', 'AAL', 'AAMC', 'AAME', 'AAN', 'AAOI', 'AAON', 'AAP', 'AAPC', 'AAPL']
count = 0 error_list = [] total = len(ticker_list) for ticker in ticker_list: count += 1 print(count, 'of', total, ':', ticker) try: df = pdr.get_data_yahoo(ticker, start=start_date, end=end_date) except Exception as e: print(str(e), 'failed to pull pricing data for', str(ticker)) error_list.append(ticker)
print(error_list)
Please advise?
Your code works "as-is" on my laptop and trading server...
btw - why not just use this syntax:
data = pdr.get_data_yahoo(ticker_list, start=start_date, end=end_date)
...and let the module manage possible connection issues with Yahoo?
i'll try upgrading again and will re-try the code. can the downloader handle a list over 7000 symbols?
It should :)
I upgraded to 0.0.16 and it is downloading however I am seeing a run-time warning when accessing/manipulating the DF which was not seen using 0.0.8.
Below is a code to reproduce the warning and the warning. Can you please look at this?
ticker = 'AAN'
start_date = '2017-06-12'
end_date = '2017-06-13'
df = pdr.get_data_yahoo(ticker, start=start_date, end=end_date)
print('DF types:', df.dtypes)
print('DF columns', df.columns)
print('DF index', df.index) # <= line causing the warning
print('DF', df)
/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/indexes/api.py:77: RuntimeWarning: Cannot compare type 'Timestamp' with type 'int', sort order is undefined for incomparable objects
result = result.union(other)
I can't reproduce the error. This code is running a-ok on both my laptop and trading server ...
As far as the 0.0.16 downloader is concerned it appears to be working well. I'm manipulating the resulting DF and this is when I see the warning. The warning is consistent on my mac laptop using 0.0.16 and is not present when using 0.0.8. I will look into this further. I appreciate you checking this out....Thx
I found the offending line of code. Basically I'm taking the resulting DF and manipulating (reshaping) it in preparation to write it to a mysql table.
Here is code snippet producing the error: ticker = 'AAN' start_date = '2017-06-12' end_date = '2017-06-13' df = pdr.get_data_yahoo(ticker, start=start_date, end=end_date)
df['price_date'] = df.index df['index'] = range(1, len(df) + 1) df.set_index('index', inplace=True) <= this line is causing the warning.
The point worth understanding (and potentially changing if really a issue) is why warning is not there for 0.0.8 but is for 0.0.16. Some aspect of the DF changed between versions?
I make a copy and then perform my manipulation and no warning so I'm less concerned about the issue. Your thoughts, comments or suggestions are welcome.
Using copies is the solution ;)
First of all, thank you for developing this "temporary" fix. I'm trying to download historical data (1yr) for 1000 symbols. I'm getting a periodic connection aborted after maybe ~200 symbols. After a (long) delay data download continues.
Here is the error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) failed to pull pricing data
How can I eliminate the connection abort...Will you please advise? TIA --R