stefan-jansen / machine-learning-for-trading

Code for Machine Learning for Algorithmic Trading, 2nd edition.
https://ml4trading.io
12.87k stars 4.11k forks source link

Chapter02 Nasdaq parse itch order flow message bug #224

Closed wushengzhong closed 2 years ago

wushengzhong commented 2 years ago

Describe the bug I am trying to run the code in 02_market_and_fundamental_data and run 01_NASDAQ_TotalView-ITCH_Order_Book example.

To Reproduce

The issue appears in Get Message Labels section.

After I successfully run the code below,

message_data = (pd.read_excel('/message_types.xlsx',
                              sheet_name='messages')
                .sort_values('id')
                .drop('id', axis=1))

def clean_message_types(df):
    df.columns = [c.lower().strip() for c in df.columns]
    df.value = df.value.str.strip()
    df.name = (df.name
               .str.strip() # remove whitespace
               .str.lower()
               .str.replace(' ', '_')
               .str.replace('-', '_')
               .str.replace('/', '_'))
    df.notes = df.notes.str.strip()
    df['message_type'] = df.loc[df.name == 'message_type', 'value']
    return df

message_types = clean_message_types(message_data)

message_labels = (message_types.loc[:, ['message_type', 'notes']]
                  .dropna()
                  .rename(columns={'notes': 'name'}))
message_labels.name = (message_labels.name
                       .str.lower()
                       .str.replace('message', '')
                       .str.replace('.', '')
                       .str.strip().str.replace(' ', '_'))
# message_labels.to_csv('message_labels.csv', index=False)
message_labels.head()

message_types.message_type = message_types.message_type.ffill()
message_types = message_types[message_types.name != 'message_type']
message_types.value = (message_types.value
                       .str.lower()
                       .str.replace(' ', '_')
                       .str.replace('(', '')
                       .str.replace(')', ''))
message_types.info()

message_types.to_csv('./message_types.csv', index=False)
message_types = pd.read_csv('./message_types.csv')

message_types.loc[:, 'formats'] = (message_types[['value', 'length']]
                            .apply(tuple, axis=1).map(formats))

alpha_fields = message_types[message_types.value == 'alpha'].set_index('name')
alpha_msgs = alpha_fields.groupby('message_type')
alpha_formats = {k: v.to_dict() for k, v in alpha_msgs.formats}
alpha_length = {k: v.add(5).to_dict() for k, v in alpha_msgs.length}

I ran into issue for the code below.

message_fields, fstring = {}, {}
for t, message in message_types.groupby('message_type'):
    message_fields[t] = namedtuple(typename=t, field_names=message.name.tolist())
    fstring[t] = '>' + ''.join(message.formats.tolist())

Expected behavior I expected the paragraph to create list successfully.

Screenshots

Below are the error messages I've got.


TypeError Traceback (most recent call last)

in 2 for t, message in message_types.groupby('message_type'): 3 message_fields[t] = namedtuple(typename=t, field_names=message.name.tolist()) ----> 4 fstring[t] = '>' + ''.join(message.formats.tolist()) TypeError: sequence item 0: expected str instance, float found - OS: MacOSX - Version: Big Sur 11.4 02_market_and_fundamental_data/01_NASDAQ_TotalView-ITCH_Order_Book
wushengzhong commented 2 years ago

I tried again and the code seems to work.