Open akutuzov opened 6 years ago
One can lemmatize Russian texts before using this model, with the help of pymystem:
def tag(word):
from pymystem3 import Mystem
m = Mystem()
processed = m.analyze(word)[0]
lemma = processed["analysis"][0]["lex"].lower().strip()
return lemma
tag('стульев')
стул
I got the following error:
>>> model = gensim.models.fasttext.FastText.load('araneum_none_fasttextcbow_300_5_2018.model')
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/gensim/models/fasttext.py", line 936, in load
model = super(FastText, cls).load(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/base_any2vec.py", line 1247, in load
if not hasattr(model.vocabulary, 'ns_exponent'):
AttributeError: 'FastTextKeyedVectors' object has no attribute 'vocabulary'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.5/dist-packages/gensim/models/fasttext.py", line 945, in load
return load_old_fasttext(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/deprecated/fasttext.py", line 53, in load_old_fasttext
old_model = FastText.load(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/deprecated/word2vec.py", line 1618, in load
model = super(Word2Vec, cls).load(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/deprecated/old_saveload.py", line 87, in load
obj = unpickle(fname)
File "/usr/local/lib/python3.5/dist-packages/gensim/models/deprecated/old_saveload.py", line 380, in unpickle
return _pickle.loads(file_bytes, encoding='latin1')
AttributeError: Can't get attribute 'FastTextKeyedVectors' on <module 'gensim.models.deprecated.keyedvectors' from '/usr/local/lib/python3.5/dist-packages/gensim/models/deprecated/keyedvectors.py'>
@andrei-q Gensim fastText code has been refactored since the time this issue was created.
In the recent versions of Gensim, you should use gensim.models.KeyedVectors.load()
to load this model.
I've changed the code snippet above accordingly.
Thanks. It works
Name: fasttext-ru_araneum-300 Link: http://rusvectores.org/static/models/rusvectores4/fasttext/araneum_none_fasttextcbow_300_5_2018.tgz Description: fastText vectors trained on Araneum Russicum Maximum corpus (about 10 billion words). The model contains 196K words and 403K 3-4-5-grams. License: CC-BY (http://rusvectores.org/en/about/) Related papers: https://arxiv.org/abs/1801.06407, https://www.academia.edu/24306935/WebVectors_a_Toolkit_for_Building_Web_Interfaces_for_Vector_Semantic_Models Preprocessing: The corpus was lemmatized with Mystem. Parameters: vector size 300, window size 5 Code example: