ranaroussi / yfinance

Download market data from Yahoo! Finance's API
https://aroussi.com/post/python-yahoo-finance
Apache License 2.0
14.59k stars 2.43k forks source link

('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) failed to pull pricing data #2

Closed jainraje closed 7 years ago

jainraje commented 7 years ago

First of all, thank you for developing this "temporary" fix. I'm trying to download historical data (1yr) for 1000 symbols. I'm getting a periodic connection aborted after maybe ~200 symbols. After a (long) delay data download continues.

Here is the error: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) failed to pull pricing data

How can I eliminate the connection abort...Will you please advise? TIA --R

jainraje commented 7 years ago

Update: I created test code to download historical data (1yr) for appx 4000 symbols.

Here is a snippet of the error (failing at the 988th symbol):

983 of 3802 : DSPG
984 of 3802 : DST
985 of 3802 : DSU
986 of 3802 : DSW
987 of 3802 : DTE
988 of 3802 : DUG
Traceback (most recent call last):
  File "test_yahoo_finnce_fix.py", line 23, in <module>
    df = pdr.get_data_yahoo(row.Symbol, start=start_date)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/fix_yahoo_finance/__init__.py", line 101, in get_data_yahoo
    dfs[ticker] = pd.read_csv(hist, index_col=0
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py", line 655, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py", line 411, in _read
    data = parser.read(nrows)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py", line 982, in read
    ret = self._engine.read(nrows)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/io/parsers.py", line 1719, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 890, in pandas._libs.parsers.TextReader.read (pandas/_libs/parsers.c:10862)
  File "pandas/_libs/parsers.pyx", line 912, in pandas._libs.parsers.TextReader._read_low_memory (pandas/_libs/parsers.c:11138)
  File "pandas/_libs/parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas/_libs/parsers.c:11884)
  File "pandas/_libs/parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas/_libs/parsers.c:11755)
  File "pandas/_libs/parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas/_libs/parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 4, saw 2

I'm happy to run some additional tests if you can advise.

TIA --R

ranaroussi commented 7 years ago

This seems to be an issue related a mis-formatted csv returned by Yahoo (probably in case of authentication/usage issues). I've added a wait and re-authentication scheme in hopes that it would solve the issue.

Upgrade using:

$ pip install fix_yahoo_finance --upgrade --no-cache-dir

If it doesn't, try calling fix_yahoo_finance.get_yahoo_crumb(force=True) every 500 calls or so and see if this helps.

jainraje commented 7 years ago

Hi Ran,

Thanks for your update. I will run a few tests today and communicate my results back to you.

Best, Rajeev

=== Rajeev Jain Jainraje@yahoo.com

On May 30, 2017, at 1:48 AM, Ran Aroussi notifications@github.com wrote:

This seems to be an issue related a mis-formatted csv returned by Yahoo (probably in case of authentication/usage issues). I've added a wait and re-authentication scheme in hopes that it would solve the issue.

Upgrade using:

$ pip install fix_yahoo_finance --upgrade --no-cache-dir If it doesn't, try calling fix_yahoo_finance.get_yahoo_crumb(force=True) every 500 calls or so and see if this helps.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

jainraje commented 7 years ago

Out of 3800 tickers, 7 failed. Here are a few snippets of the failed cases:

1017 of 3802 : EBS Requesting EBS EOD data from yahoo start: 1990-01-01 end: 2017-05-30 https://query1.finance.yahoo.com/v7/finance/download/EBS?period1=631180800&period2=1496127600&interval=1d&events=history&crumb=hAuEssz\u002FWbk Unknown string format failed to pull pricing data

1680 of 3802 : HUBB Requesting HUBB EOD data from yahoo start: 1990-01-01 end: 2017-05-30 https://query1.finance.yahoo.com/v7/finance/download/HUBB?period1=631180800&period2=1496127600&interval=1d&events=history&crumb=KmkwymyN72p 'Volume' failed to pull pricing data

1721 of 3802 : IDE Requesting IDE EOD data from yahoo start: 1990-01-01 end: 2017-05-30 https://query1.finance.yahoo.com/v7/finance/download/IDE?period1=631180800&period2=1496127600&interval=1d&events=history&crumb=qRHQYjmNhE2 Unknown string format failed to pull pricing data

2285 of 3802 : MTW Requesting MTW EOD data from yahoo start: 1990-01-01 end: 2017-05-30 https://query1.finance.yahoo.com/v7/finance/download/MTW?period1=631180800&period2=1496127600&interval=1d&events=history&crumb=rT8SjAj4cJp Unknown string format failed to pull pricing data

I ran it again for just the failed tickers and data was retrieved successfully. Perhaps the bad responses are related to server overload?

ranaroussi commented 7 years ago

These are certainly better results. I've released a new version. Let's see if we can do better :)

Upgrade using:

$ pip install fix_yahoo_finance --upgrade --no-cache-dir
jainraje commented 7 years ago

I've been running your newer version for a few days. I see significant improvement. Very nice.

To be complete with my feedback, I still a small number of symbols not downloading and require a second pass.

Loop1 - 7273 tickers download 5 days of OHLCV for each ticker if error add to error_list

Loop2 - run through error_list (33 tickers) download 5 days of OHLCV for each ticker if error add to error_list

after 2nd loop:

Question: Any thoughts on why the 16 symbols could not be downloaded during the first pass and required a 2nd pass? Why the remaining 17 are not downloaded makes senses as they are not recognized by yahoo finance.

At any rate, your current fix is a welcome solution and is very workable. thank-you!

ranaroussi commented 7 years ago

I've added thread support in 0.0.9. Can you please check if this is working better for you?

Upgrade using:

$ pip install fix_yahoo_finance --upgrade --no-cache-dir

and add threads=INT to your code (I use threads=len(tickers)//10) myself)

jainraje commented 7 years ago

I just updated to latest: fix-yahoo-finance 0.0.13

I'm getting many 'Index' errors. See log below. Same code on my end which worked quite well with your 0.0.8. Am I doing something wrong? When I revert back to 0.0.8 no Index errors.

    df = pdr.get_data_yahoo(str(ticker), start=start, end=end)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/fix_yahoo_finance/__init__.py", line 190, in download
    data = pd.Panel(_DFS_)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 148, in __init__
    minor_axis=minor_axis, copy=copy, dtype=dtype)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 173, in _init_data
    mgr = self._init_dict(data, passed_axes, dtype=dtype)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 214, in _init_dict
    for i, a in enumerate(axes)]
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 214, in <listcomp>
    for i, a in enumerate(axes)]
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/panel.py", line 1466, in _extract_axis
    index = _get_combined_index(indexes, intersect=intersect)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/indexes/api.py", line 43, in _get_combined_index
    union = _union_indexes(indexes)
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/indexes/api.py", line 74, in _union_indexes
    return result.union_many(indexes[1:])
  File "/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/indexes/datetimes.py", line 1050, in union_many
    if this.freq is None:
AttributeError: 'Index' object has no attribute 'freq'
ranaroussi commented 7 years ago

Can you please check on v. 0.0.14?

Upgrade using:

$ pip install fix_yahoo_finance --upgrade --no-cache-dir
jainraje commented 7 years ago

I upgraded to latest (i.e. 0.0.15) but error occurs. I revert back to 0.0.8 and price download works. Do I need to change my calling convention?

Here is the code: (actual ticker_list is much bigger) ticker_list = ['A', 'AA', 'AAAP', 'AAC', 'AADR', 'AAL', 'AAMC', 'AAME', 'AAN', 'AAOI', 'AAON', 'AAP', 'AAPC', 'AAPL']

count = 0 error_list = [] total = len(ticker_list) for ticker in ticker_list: count += 1 print(count, 'of', total, ':', ticker) try: df = pdr.get_data_yahoo(ticker, start=start_date, end=end_date) except Exception as e: print(str(e), 'failed to pull pricing data for', str(ticker)) error_list.append(ticker)

print(error_list)

Please advise?

ranaroussi commented 7 years ago

Your code works "as-is" on my laptop and trading server...

btw - why not just use this syntax:

data = pdr.get_data_yahoo(ticker_list, start=start_date, end=end_date)

...and let the module manage possible connection issues with Yahoo?

jainraje commented 7 years ago

i'll try upgrading again and will re-try the code. can the downloader handle a list over 7000 symbols?

ranaroussi commented 7 years ago

It should :)

jainraje commented 7 years ago

I upgraded to 0.0.16 and it is downloading however I am seeing a run-time warning when accessing/manipulating the DF which was not seen using 0.0.8.

Below is a code to reproduce the warning and the warning. Can you please look at this?

ticker = 'AAN'
start_date = '2017-06-12'
end_date = '2017-06-13'
df = pdr.get_data_yahoo(ticker, start=start_date, end=end_date)

print('DF types:', df.dtypes)
print('DF columns', df.columns)
print('DF index', df.index) # <= line causing the warning
print('DF', df)

/Users/rajeev/anaconda/lib/python3.5/site-packages/pandas/core/indexes/api.py:77: RuntimeWarning: Cannot compare type 'Timestamp' with type 'int', sort order is undefined for incomparable objects
  result = result.union(other)
ranaroussi commented 7 years ago

I can't reproduce the error. This code is running a-ok on both my laptop and trading server ...

jainraje commented 7 years ago

As far as the 0.0.16 downloader is concerned it appears to be working well. I'm manipulating the resulting DF and this is when I see the warning. The warning is consistent on my mac laptop using 0.0.16 and is not present when using 0.0.8. I will look into this further. I appreciate you checking this out....Thx

jainraje commented 7 years ago

I found the offending line of code. Basically I'm taking the resulting DF and manipulating (reshaping) it in preparation to write it to a mysql table.

Here is code snippet producing the error: ticker = 'AAN' start_date = '2017-06-12' end_date = '2017-06-13' df = pdr.get_data_yahoo(ticker, start=start_date, end=end_date)

df['price_date'] = df.index df['index'] = range(1, len(df) + 1) df.set_index('index', inplace=True) <= this line is causing the warning.

The point worth understanding (and potentially changing if really a issue) is why warning is not there for 0.0.8 but is for 0.0.16. Some aspect of the DF changed between versions?

I make a copy and then perform my manipulation and no warning so I'm less concerned about the issue. Your thoughts, comments or suggestions are welcome.

ranaroussi commented 7 years ago

Using copies is the solution ;)