Dictionary - handling OOV tokens

salesforce / awd-lstm-lm

LSTM and QRNN Language Model Toolkit for PyTorch

BSD 3-Clause "New" or "Revised" License

1.96k stars 488 forks source link

Dictionary - handling OOV tokens #65

Open chiphuyen opened 6 years ago

chiphuyen commented 6 years ago

I was looking into the data.py and saw that the dictionary consists of all tokens in train, val, and test files. I'm wondering if adding unseen tokens in val/test files to the dictionary will affect the testing in any way? Thanks!

gyuwankim commented 6 years ago

Agreed. It could be okay for the benchmark dataset but seems problematic in a real scenario.