Closed shgidi closed 6 years ago
@shgidi
Facebook distribute 2 type of files:
.vec
contains ONLY word-vectors (no ngrams here), can be loaded with KeyedVectors.load_word2vec_format
.bin
contains ngrams, can be loaded with FastText.load_fasttext_format
next time please ask in mailing list mailing list
@menshikh-iv is this clear from our documentation?
I see people confused about these formats, how to load them and what can be done with them, all the time.
A clear, authoritative docs section would help us with support too (just point with hyperlink).
@piskvorky I agree this situation happens sometimes, it worth to make a tutorial.
A tutorial would be ideal, but a simple paragraph in the docs would go a long way. Can you add it?
This is not working for me with gensim 3.5, python 3.6, and a FB model:
from gensim.models import FastText
model_yelp = FastText.load_fasttext_format('yelp_review_full.bin')
I get:
NotImplementedError: Supervised fastText models are not supported
@scottlittle please read an exception again: we really don't support supervised fasttext models
@shgidi https://github.com/facebookresearch/fastText/tree/master/python worked for me.
What is meant by supervised fasttext models and how to train for unsupervised?
@romass12
supervised fasttext models
Exactly what supervised learning means. FB implementation have supervised-mode support (gensim - only unsupervised)
how to train for unsupervised
Just read an Gensim FastText documentation
When downloading fastText with this method, we get a folder with a file in standard word2vec format, which can be loaded with
model = KeyedVectors.load_word2vec_format(path, binary=False)
But not withfrom gensim.models import FastText
model = FastText.load_fasttext_format(path, binary=False)
This disables the ability to get vectors for out-of-vocabulary words. How can this be done correcly?