Closed hvgazula closed 2 years ago
https://github.com/hassonlab/247-pickling/blob/main/scripts/tfspkl_main.py#L242
script: tfspkl_main.py function: add_vocab_columns code: df[f'in_{key}'] = df.word.apply(lambda x: calc_tokenizer_length(tokenizer, x)) replace with: df[f'in_{key}'] = df.word.apply(lambda x: x in tokenizer.get_vocab().keys())
tfspkl_main.py
add_vocab_columns
df[f'in_{key}'] = df.word.apply(lambda x: calc_tokenizer_length(tokenizer, x))
df[f'in_{key}'] = df.word.apply(lambda x: x in tokenizer.get_vocab().keys())
@zkokaja @VeritasJoker
addressed in https://github.com/hassonlab/247-pickling/pull/68
Nvm..the proposed function is very slow. Closing.
https://github.com/hassonlab/247-pickling/blob/main/scripts/tfspkl_main.py#L242
script:
tfspkl_main.py
function:add_vocab_columns
code:df[f'in_{key}'] = df.word.apply(lambda x: calc_tokenizer_length(tokenizer, x))
replace with:df[f'in_{key}'] = df.word.apply(lambda x: x in tokenizer.get_vocab().keys())
@zkokaja @VeritasJoker