Proposal for replacing line of code checking if word is in model vocabulary

hassonlab / 247-pickling

Contains code to create pickles from raw/processed data

1 stars 9 forks source link

Proposal for replacing line of code checking if word is in model vocabulary #70

Closed hvgazula closed 2 years ago

hvgazula commented 2 years ago

https://github.com/hassonlab/247-pickling/blob/main/scripts/tfspkl_main.py#L242

script: tfspkl_main.py function: add_vocab_columns code: df[f'in_{key}'] = df.word.apply(lambda x: calc_tokenizer_length(tokenizer, x)) replace with: df[f'in_{key}'] = df.word.apply(lambda x: x in tokenizer.get_vocab().keys())

@zkokaja @VeritasJoker

hvgazula commented 2 years ago

addressed in https://github.com/hassonlab/247-pickling/pull/68

hvgazula commented 2 years ago

Nvm..the proposed function is very slow. Closing.