pydata / pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.
https://pydata.github.io/pandas-datareader/stable/index.html
Other
2.9k stars 681 forks source link

followup on Yahoo DataReader issues #356

Closed jreback closed 6 years ago

jreback commented 7 years ago

in #355 I xfailed a couple of tests as they were not pulling data. These should be investigated before the release.

cc @rgkimball

jrovegno commented 7 years ago

When I put pdr.get_data_yahoo('SPY') I get this error ConnectionError: HTTPConnectionPool(host='ichart.finance.yahoo.com', port=80): Max retries exceeded with url: /table.csv?s=SPY&a=0&b=1&c=2010&d=6&e=4&f=2017&g=d&ignore=.csv (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x0000000009DE1198>: Failed to establish a new connection: [Errno 11004] getaddrinfo failed',))

Similar problem in the docs.

gliptak commented 7 years ago

@jreback Do you plan to publish 0.5.0 soon? Thanks

https://pypi.python.org/pypi/pandas-datareader

jreback commented 7 years ago

yes if this can be resolved soon

gliptak commented 7 years ago

I cannot reproduce this above with current master (installed as pip install git+git://github.com/pydata/pandas-datareader.git):

➜ pip list | grep pandas-datareader
pandas-datareader (0.5.0)
➜ jupyter-console                  
Jupyter console 5.1.0

Python 2.7.12+ (default, Sep 17 2016, 12:08:02) 
Type "copyright", "credits" or "license" for more information.

IPython 5.4.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas_datareader as pdr

In [2]: pdr.get_data_yahoo('SPY').tail()
Out[2]: 
                  Open        High         Low       Close   Adj Close  \
Date                                                                     
2017-06-27  243.039993  243.380005  241.309998  241.330002  241.330002   
2017-06-28  242.500000  243.720001  242.229996  243.490005  243.490005   
2017-06-29  243.660004  243.720001  239.960007  241.350006  241.350006   
2017-06-30  242.279999  242.710007  241.580002  241.800003  241.800003   
2017-07-03  242.880005  243.380005  242.210007  242.210007  242.210007   

               Volume  
Date                   
2017-06-27   82247700  
2017-06-28   70042600  
2017-06-29  103933000  
2017-06-30   86820700  
2017-07-03   39147200  

In [3]: 
rgkimball commented 7 years ago

Based on the URI, this appears to be the previous API - I believe you just need the latest updates.

On Jul 4, 2017, at 2:44 PM, Gábor Lipták notifications@github.com wrote:

I cannot reproduce this above with current master (installed as pip install git+git://github.com/pydata/pandas-datareader.git):

➜ pip list | grep pandas-datareader pandas-datareader (0.5.0) ➜ jupyter-console
Jupyter console 5.1.0

Python 2.7.12+ (default, Sep 17 2016, 12:08:02) Type "copyright", "credits" or "license" for more information.

IPython 5.4.1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details.

In [1]: import pandas_datareader as pdr

In [2]: pdr.get_data_yahoo('SPY').tail() Out[2]: Open High Low Close Adj Close \ Date
2017-06-27 243.039993 243.380005 241.309998 241.330002 241.330002
2017-06-28 242.500000 243.720001 242.229996 243.490005 243.490005
2017-06-29 243.660004 243.720001 239.960007 241.350006 241.350006
2017-06-30 242.279999 242.710007 241.580002 241.800003 241.800003
2017-07-03 242.880005 243.380005 242.210007 242.210007 242.210007

           Volume  

Date
2017-06-27 82247700
2017-06-28 70042600
2017-06-29 103933000
2017-06-30 86820700
2017-07-03 39147200

In [3]: — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jreback commented 7 years ago

you have to run the test and see what is xfailed

these r all related to dividends in yahoo

gliptak commented 7 years ago

Yahoo works inconsistently ... For example:

https://github.com/pydata/pandas-datareader/blob/master/pandas_datareader/tests/yahoo/test_yahoo.py#L215 (SPLIT) returns 0.0 and 0.14285714

https://github.com/pydata/pandas-datareader/blob/master/pandas_datareader/tests/yahoo/test_yahoo.py#L112 returns 251 and 252 (missing and containing 2013-12-31)

How would you see the tests updated re this above? Thanks

https://travis-ci.org/gliptak/pandas-datareader/jobs/250111075

jreback commented 7 years ago
(pandas) bash-3.2$ ./test.sh 
=========================================================================================== test session starts ===========================================================================================
platform darwin -- Python 3.6.1, pytest-3.1.2, py-1.4.34, pluggy-0.4.0
rootdir: /Users/jreback/pandas-datareader, inifile:
plugins: cov-2.3.1, xdist-1.16.0
collected 110 items 

pandas_datareader/tests/test_base.py ....
pandas_datareader/tests/test_data.py .sX...
pandas_datareader/tests/test_edgar.py ssss
pandas_datareader/tests/test_enigma.py ssss
pandas_datareader/tests/test_eurostat.py ....
pandas_datareader/tests/test_famafrench.py ......
pandas_datareader/tests/test_fred.py ..s..s.
pandas_datareader/tests/test_nasdaq.py .
pandas_datareader/tests/test_oanda.py ss
pandas_datareader/tests/test_oecd.py ...
pandas_datareader/tests/test_tsp.py ..
pandas_datareader/tests/test_wb.py ........
pandas_datareader/tests/google/test_google.py ............
pandas_datareader/tests/google/test_options.py .........
pandas_datareader/tests/io/test_jsdmx.py ..
pandas_datareader/tests/io/test_sdmx.py .
pandas_datareader/tests/yahoo/test_options.py ...............
pandas_datareader/tests/yahoo/test_yahoo.py ......sss..x..XX..X.
========================================================================================= short test summary info =========================================================================================
XFAIL pandas_datareader/tests/yahoo/test_yahoo.py::TestYahoo::()::test_get_data_interval
  failing after #355
XPASS pandas_datareader/tests/test_data.py::TestDataReader::()::test_read_yahoo_dividends failing after #355
XPASS pandas_datareader/tests/yahoo/test_yahoo.py::TestYahoo::()::test_get_date_ret_index failing after #355
XPASS pandas_datareader/tests/yahoo/test_yahoo.py::TestYahoo::()::test_get_data_yahoo_actions failing after #355
XPASS pandas_datareader/tests/yahoo/test_yahoo.py::TestYahoo::()::test_yahoo_DataReader failing after #355

right so the XPASS ones you can prob just take the decorator off and add the @skip_on_exception(RemoteDataError). these happened to fail at times and were annoying. they look correct though.

the get_data_interval is consistenty failing though so should be fixed (not sure what is wrong)

jreback commented 7 years ago

I think for the inconsistencies you can just make the test less strict, e.g. accept 251 or 252 (odd that yahoo is like that), just add a comment about this.

rgkimball commented 7 years ago

My only guess is that we need to control for a tie zone in the unix stamps.

On Jul 4, 2017, at 4:56 PM, Jeff Reback notifications@github.com wrote:

I think for the inconsistencies you can just make the test less strict, e.g. accept 251 or 252 (odd that yahoo is like that), just add a comment about this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

gliptak commented 7 years ago

@rgkimball https://github.com/pydata/pandas-datareader/blob/master/pandas_datareader/yahoo/daily.py#L100

As example values, self.start comes in as 2010-01-01 00:00:00 and self.end comes in as 2013-01-27 00:00:00

Maybe self.end to be modified to 2013-01-27 23:59:59? Just during conversion or earlier in the code?

This doesn't introduce time zones (yet) ...

jreback commented 7 years ago

if someone has a chance.....this (and #296) are remaining for 0.5.0

gliptak commented 7 years ago

Yahoo get_components is also broken https://github.com/pydata/pandas-datareader/issues/238

gliptak commented 7 years ago

@jreback Could current master be released as 0.5.0? It is an improvement, as Yahoo currently doesn't work altogether ... Thanks

jreback commented 7 years ago

I think the remainer of the xfails should be addressed. Then can do the release.

bashtage commented 6 years ago

Yahoo has been deprecated.