nathanshartmann / portuguese_word_embeddings

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks
GNU General Public License v3.0
240 stars 35 forks source link

What means </s>? #4

Closed bledson closed 6 years ago

bledson commented 6 years ago

By any chance this has a special meaning like 'unk' for rare/unknown words?

nathanshartmann commented 6 years ago

This token probably came from a UGC corpus like subtitles. Unfortunately, because our corpus is too huge, there are wrong tokens that were not removed in preprocessing step. The same happens in embedding models for English.

Hope I have answered your question.

Nathan Siegle Hartmann

Em 16 de mai de 2018, à(s) 22:57, Bledson Kivy notifications@github.com escreveu:

By any chance this has a special meaning like 'unk' for rare/unknown words?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

bledson commented 6 years ago

Thanks! It was exactly that.