Closed nshaud closed 5 years ago
It still fails using the fasttext.load_facebook_model
method, however using the French embeddings, it works:
import gensim
model = gensim.models.fasttext.load_facebook_model('/data/cc.fr.300.bin')
model.wv['test']
# array([ 0.03151339, -0.04408491, ... 0.0188015 , 0.032352 ], dtype=float32)
It also works using the Wikipedia English embeddings (wiki.en.bin
).
Does this mean that there is something wrong with the format of cc.en.300.bin
?
Thank you for reporting this. Could you provide full URLs to the models you are using, so I can try to reproduce this?
Here are all the models I mentioned:
I think gensim 3.7.2 already fixed this problem. Could you please double check?
(372.env) mpenkov@hetrad2:~/data/2435$ pip freeze | grep gensim
gensim==3.7.2
(372.env) mpenkov@hetrad2:~/data/2435$ cat bug.py
import gensim.models.fasttext
vector = gensim.models.fasttext.load_facebook_vectors('../cc.en.300.bin')
print(vector)
model = gensim.models.fasttext.load_facebook_model('../cc.en.300.bin')
print(model)
(372.env) mpenkov@hetrad2:~/data/2435$ python bug.py
<gensim.models.keyedvectors.FastTextKeyedVectors object at 0x7f815e2005c0>
FastText(vocab=2000000, size=300, alpha=0.025)
(372.env) mpenkov@hetrad2:~/data/2435$
I tried again with gensim 3.7.2 after redownloading the model file from Facebook's FastText page and it seems to work. The md5 checksums of old and new files are not the same, so I guess a corrupted model was the problem.
Problem description
I am trying to fine-tune a pretrained FastText using gensim. I use the weights from the official Facebook implementation. Partial loading works fine, but full model loading results in AssertionError.
Steps/code/corpus to reproduce
results in
Versions
Please provide the output of: