shlomiku / zipline-trader

Zipline Trader, a Pythonic Algorithmic Trading Library with broker integration
https://github.com/shlomikushchi/zipline-trader
Apache License 2.0
315 stars 71 forks source link

universe SP500 load data error from 2016-01-03 to 2017-01-03: AttributeError: 'Index' object has no attribute 'normalize' #190

Open arisliang opened 3 years ago

arisliang commented 3 years ago

When loading data universe SP 500 from 2016-01-03 to 2017-01-03, it gives 'Index' object has no attribute 'normalize':

C:\Users\arisl\anaconda3\envs\zipline-trader\python.exe C:/src/lycn/conda-environments/zipline-trader/src/zipline-trader/zipline/data/bundles/alpaca_api.py C:/src/lycn/conda-environments/zipline-trader/src/zipline-trader/zipline/data/bundles/alpaca_api.py:276: UserWarning: Overwriting bundle with name 'alpaca_api' def api_to_bundle(interval=['1m']): C:/src/lycn/conda-environments/zipline-trader/src/zipline-trader/zipline/data/bundles/alpaca_api.py:355: UserWarning: Overwriting bundle with name 'alpaca_api' end_session=end_date Traceback (most recent call last): File "C:/src/lycn/conda-environments/zipline-trader/src/zipline-trader/zipline/data/bundles/alpaca_api.py", line 363, in show_progress=True, File "C:\src\lycn\conda-environments\zipline-trader\src\zipline-trader\zipline\data\bundles\core.py", line 513, in ingest pth.data_path([name, timestr], environ=environ), File "C:/src/lycn/conda-environments/zipline-trader/src/zipline-trader/zipline/data/bundles/alpaca_api.py", line 306, in ingest daily_bar_writer.write(daily_data_generator(), assets=assets_to_sids.values(), show_progress=True) File "C:\src\lycn\conda-environments\zipline-trader\src\zipline-trader\zipline\data\psql_daily_bars.py", line 617, in write return self._write_internal(it, assets) File "C:\src\lycn\conda-environments\zipline-trader\src\zipline-trader\zipline\data\psql_daily_bars.py", line 665, in _write_internal for asset_id, table in iterator: File "C:\src\lycn\conda-environments\zipline-trader\src\zipline-trader\zipline\data\psql_daily_bars.py", line 658, in iterator for asset_id, table in iterator: File "C:\Users\arisl\anaconda3\envs\zipline-trader\lib\site-packages\click_termui_impl.py", line 315, in generator for rv in self.iter: File "C:\src\lycn\conda-environments\zipline-trader\src\zipline-trader\zipline\data\psql_daily_bars.py", line 609, in for sid, df in data File "C:\src\lycn\conda-environments\zipline-trader\src\zipline-trader\zipline\data\psql_daily_bars.py", line 716, in _write_to_postgres result = self._format_df_columns_and_index(data, sid) File "C:\src\lycn\conda-environments\zipline-trader\src\zipline-trader\zipline\data\psql_daily_bars.py", line 716, in _write_to_postgres result = self._format_df_columns_and_index(data, sid) File "C:\src\lycn\conda-environments\zipline-trader\src\zipline-trader\zipline\data\psql_daily_bars.py", line 806, in _format_df_columns_and_index data.index = data.index.normalize() AttributeError: 'Index' object has no attribute 'normalize'

shlomiku commented 3 years ago

Hi, you should check your pandas version. image

a pandas index has a normalize method.

do you know what symbol it happened with? maybe the data is bad for that asset

charlienewey commented 3 years ago

I am experiencing this issue too. It's happening while trying to load the last 4 years' worth of S&P 500 data.

@shlomikushchi the AttributeError is raised on an Index, not a DateTimeIndex. I am wondering if there is some point in the code where a DateTimeIndex should be assigned but isn't.

fimmugit commented 2 years ago

I also encountered this problem when ingesting ALL stocks. For smaller set of stocks, i.e. SP500, it is running fine and takes a couple minutes.

It turns out that the problem happens when data index is type of pandas.core.indexes.base.Index. So to solve the problem, I change line 806 in zipline-trader/zipline/data/psql_daily_bars.py:

from

     data.index = data.index.normalize()

to

    if isinstance(data.index, pd.core.indexes.base.Index):
        data.index = pd.to_datetime(data.index, utc=True)
    data.index = data.index.normalize()

It seems working.

Please note that ingesting ALL stocks takes a long time (about 1:30") even for one year worth of data. I am not sure if it takes more time to download data from Alpaca (I kind of believing this is possible due to Alpaca's rate limit) or if it takes more time to populate postgres database. I subscribe other data providers and may try to download data as text files and populate postgres from them. It takes reasonable time downloading from those providers from my previous experience.

One other thing worth to consider is to pre-scan the universe based on some criteria to reduce the stocks to be ingested.