How exactly FastText and CharNGram embedding work in Torchtext

pytorch / text

Models, data loaders and abstractions for language processing, powered by PyTorch

https://pytorch.org/text

BSD 3-Clause "New" or "Revised" License

3.51k stars 810 forks source link

How exactly FastText and CharNGram embedding work in Torchtext #461

Open MFajcik opened 6 years ago

MFajcik commented 6 years ago

Hi, I have 2 questions: Firstly, I know in past I found documentation on how CharNGram embedding works (either somewhere in code or in the docs), but I cannot find it right now.

Secondly I would like to know if FastText uses precomputed word vectors for pretrained vocabulary, or calling it within build_vocab constructs embeddings from pre-trained ngrams for training data vocabulary. In other words, does build_vocab handle out-of-pretrained_vocabulary words?

Thank you for the information.

zhangguanheng66 commented 5 years ago

an unit test could be added as a good example.