stefan-jansen / machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.
https://ml4trading.io
13.29k stars 4.2k forks source link

Getting UnsortedIndexError when trying to read/filter assets.h5 file in 04_alpha_factor_research/00_data/feature_engineering notebook #9

Closed jeffreybreen closed 5 years ago

jeffreybreen commented 5 years ago

Hi:

I created the assets.h5 file with data/create_datasets.ipynb and it looks fine:

$ h5ls -f assets.h5 
/fred                    Group
/quandl                  Group
/sp500                   Group
/us_equities             Group

However, 04_alpha_factor_research/00_data/feature_engineering.ipynb throws the following error when it tries to read and filter the prices data set:

DATA_STORE = '../../data/assets.h5'
with pd.HDFStore(DATA_STORE) as store:
    prices = store['quandl/wiki/prices'].loc[idx['2000':'2018', :], 'adj_close'].unstack('ticker')
    stocks = store['us_equities/stocks'].loc[:, ['marketcap', 'ipoyear', 'sector']]
[...]
UnsortedIndexError: 'MultiIndex slicing requires the index to be lexsorted: slicing on levels [0], lexsort depth 0'

The prices data seems fine -- it just appears to be the filtering which is breaking.

I am still fairly new to Pandas, but I got the filter to work by explicitly creating a date range:

prices = store['quandl/wiki/prices'].loc[ pd.date_range(start='1/1/2000', end='12/31/2018'), idx['adj_close'] ].unstack('ticker')

Thanks! Jeffrey

jeffreybreen commented 5 years ago

Also -- import seaborn as sns is missing from this workbook

stefan-jansen commented 5 years ago

Hi Jeffrey, the first error shows up because pandas expects a MultiIndex to be sorted as the error message indicates. You can achieve this by calling .sort_index(). For some reason, I'm not getting this error but I've added the .sort_index() step to the create_datasets notebook to make sure the data is stored in sorted form.

You're right on seaborn. I've been using a template that automatically imports a few packages and forgot to add explicit import statements. It's probably missing from a few others as well, I'll review and update accordingly, thanks for letting me know.