pydata / parallel-tutorial

Parallel computing in Python tutorial materials
300 stars 111 forks source link

prep.py Error #30

Open nyirock opened 6 years ago

nyirock commented 6 years ago

It seems that google have updated their API, so when running prep.py it raises a remote error: raise RemoteDataError('Unable to read URL: {0}'.format(url)) pandas_datareader._utils.RemoteDataError: Unable to read URL: http://www.google. com/finance/historical?q=usb&startdate=Jan+27%2C+2017&enddate=Jan+27%2C+2018&out put=csv Is there a way the offline versions of JSON files could be made available?

isunli commented 6 years ago

Same, any one can fix this problem?

mdtdev commented 5 years ago

Clearly the data source is no longer supported. Does anyone know an alternate source to use for the data? Or equivalent data to download?

mrocklin commented 5 years ago

I don't personally know of a good place to download this data, but I wouldn't be surprised if one exists.

The dask repository now includes a dask.datasets.timeseries function that generates entirely fake data that might fit in, though would be less interesting. If someone wants to do this I suspect it would be welcome.

cdeil commented 5 years ago

I also wanted to try this tutorial, but couldn't get the data:

(parallel) hfm-1804a:parallel-tutorial deil$ python prep.py
Traceback (most recent call last):
  File "prep.py", line 21, in <module>
    dask.set_options(get=dask.multiprocessing.get)
  File "/Users/deil/software/anaconda3/envs/parallel/lib/python3.6/site-packages/dask/context.py", line 18, in set_options
    raise TypeError("The dask.set_options function has been deprecated.\n"
TypeError: The dask.set_options function has been deprecated.
Please use dask.config.set instead

  Before: with dask.set_options(foo='bar'):
              ...
  After:  with dask.config.set(foo='bar'):
              ...

I don't personally know of a good place to download this data, but I wouldn't be surprised if one exists.

How big is the data that was downloaded by prep.py. If it's less than 1 GB maybe you could just put a copy in this Github repo?

Would be great to have this tutorial working....

mrocklin commented 5 years ago

I agree that putting the data into the repository is possible. Unfortunately I no longer know how to obtain the data. My recommendation that someone rework the examples to use the dask.datasets.timeseries function is, I think, still the best approach I can think of personally. Alternate solutions would be welcome if people want to implement them.

cdeil commented 5 years ago

I agree that putting the data into the repository is possible. Unfortunately I no longer know how to obtain the data.

@minrk - maybe you still have a copy of the files around?

My recommendation that someone rework the examples to use the dask.datasets.timeseries function is, I think, still the best approach I can think of personally.

I could try tomorrow. But to me, bundling example data in the tutorial repo seems like the better solution if it's small, to increase chances of it working in the future.

mrocklin commented 5 years ago

dask.datasets.timeseries produces random data using the numpy.random module. It's definitely as robust as packaging data, and has the benefit of working over conference wifi.

I think it's ok to have a few megabytes of data here, but we need to expect this tutorial to be run over very poor internet connections. Anything over a few tens of megabytes is unpleasant.

jjbankert commented 5 years ago

In order to even get to the google error I've set dask=0.20.2 and pandas =0.22 in the environment.yml file. Dask ran into the same issue as @cdeil reported, and pandas reported the following exception:

(parallel) [parallel-tutorial]$ python prep.py
Traceback (most recent call last):
  File "prep.py", line 44, in <module>
    write_stock(symbol)
  File "prep.py", line 37, in write_stock
    data_source='google')
  File "/opt/anaconda3/envs/parallel/lib/python3.6/site-packages/dask/dataframe/io/demo.py", line 202, in daily_stock
    from pandas_datareader import data
  File "/opt/anaconda3/envs/parallel/lib/python3.6/site-packages/pandas_datareader/__init__.py", line 2, in <module>
    from .data import (DataReader, Options, get_components_yahoo,
  File "/opt/anaconda3/envs/parallel/lib/python3.6/site-packages/pandas_datareader/data.py", line 14, in <module>
    from pandas_datareader.fred import FredReader
  File "/opt/anaconda3/envs/parallel/lib/python3.6/site-packages/pandas_datareader/fred.py", line 1, in <module>
    from pandas.core.common import is_list_like
ImportError: cannot import name 'is_list_like'