stefan-jansen / machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.
https://ml4trading.io
12.57k stars 4.03k forks source link

Chapter 12, 04_preparing_the_model_data.ipynb #278

Closed ir0nt0ad closed 1 year ago

ir0nt0ad commented 1 year ago

In the deciles cells, pd.qcut doesn't want to handle series which are all NaN and throws "IndexError: cannot do a non-empty take from an empty axes." I'm using pandas 1.5.2.

I solved it somewhat crudely like this:

Daily historical return deciles

for t in T:
    prices[f'r{t:02}dec'] = (prices[f'r{t:02}']
                             .dropna()
                             .groupby(level='date')
                             .apply(lambda x: pd.qcut(x, 
                                                      q=10, 
                                                      labels=False, 
                                                      duplicates='drop')))

Daily sector return deciles

for t in T:
    prices[f'r{t:02}q_sector'] = (prices
                                  .groupby(['date', 'sector'])[f'r{t:02}']
                                  .transform(lambda x: pd.qcut(x, 
                                                               q=5, 
                                                               labels=False, 
                                                               duplicates='drop')
                                                       if not x.isnull().all() else np.nan))
stefan-jansen commented 1 year ago

Thanks @sh0gg0th, there's probably not much one can other than graceful error handling if there is simply no data available. Since it looks like you found a solution, I'll close this for now but pls feel free to reopen if you have further questions.

SUSHANTH009 commented 11 months ago

In the deciles cells, pd.qcut doesn't want to handle series which are all NaN and throws "IndexError: cannot do a non-empty take from an empty axes." I'm using pandas 1.5.2.

I solved it somewhat crudely like this:

Daily historical return deciles

for t in T:
    prices[f'r{t:02}dec'] = (prices[f'r{t:02}']
                             .dropna()
                             .groupby(level='date')
                             .apply(lambda x: pd.qcut(x, 
                                                      q=10, 
                                                      labels=False, 
                                                      duplicates='drop')))

Daily sector return deciles

for t in T:
    prices[f'r{t:02}q_sector'] = (prices
                                  .groupby(['date', 'sector'])[f'r{t:02}']
                                  .transform(lambda x: pd.qcut(x, 
                                                               q=5, 
                                                               labels=False, 
                                                               duplicates='drop')
                                                       if not x.isnull().all() else np.nan))

when I use this command I am getting all the rows of [f'r{t:02}dec'] this column as null