Closed bledson closed 6 years ago
This token probably came from a UGC corpus like subtitles. Unfortunately, because our corpus is too huge, there are wrong tokens that were not removed in preprocessing step. The same happens in embedding models for English.
Hope I have answered your question.
Nathan Siegle Hartmann
Em 16 de mai de 2018, à(s) 22:57, Bledson Kivy notifications@github.com escreveu:
By any chance this has a special meaning like 'unk' for rare/unknown words?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.
Thanks! It was exactly that.
By any chance this has a special meaning like 'unk' for rare/unknown words?