piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.55k stars 4.37k forks source link

AttributeError: 'FastText' object has no attribute 'intersect_word2vec_format' #2860

Open saharghannay opened 4 years ago

saharghannay commented 4 years ago

Problem description

I would like to fine-tune a fasttext embeddings model trained on wiki data on new in domain data, I was using this code;

Steps/code/corpus to reproduce

model = KeyedVectors.load_word2vec_format(args.pretrained_model,binary=False) model_Fasttext_cbow = FastText(size=args.vector_size, window=args.window, min_count=args.min_count, workers=8,sg=0) model_Fasttext_cbow.build_vocab(sentences) total_examples = model_Fasttext_cbow.corpus_count model_Fasttext_cbow.build_vocab([list(model.wv.vocab.keys())], update=True) model_Fasttext_cbow.intersect_word2vec_format(args.pretrained_model, binary=False,lockf=1.0) model_Fasttext_cbow.train(sentences, total_examples=total_examples, epochs=5)

But I got this error : AttributeError: 'FastText' object has no attribute 'intersect_word2vec_format'

How can I fix this problem ?

Versions

Please provide the output of:

import platform; print(platform.platform()): Linux-4.4.0-164-generic-x86_64-with-debian-stretch-sid
import sys; print("Python", sys.version): Python 3.7.7 [GCC 7.3.0]
import numpy; print("NumPy", numpy.__version__): NumPy 1.18.1
import scipy; print("SciPy", scipy.__version__): SciPy 1.4.1
import gensim; print("gensim", gensim.__version__): gensim 3.8.0
from gensim.models import word2vec;print("FAST_VERSION", word2vec.FAST_VERSION): FAST_VERSION 1
gojomo commented 4 years ago

There's no explicit support for any particular 'fine-tuning' operation. And, the .intersect_word2vec_format() method was an experimental offering, once available on Word2Vec (and thus inherited by some other classes), which was confined to Word2Vec only by a prior refactoring. Whether it still has any use, or could potentially be adapted to other classes, is something a user would have to look at the source code & decide for themselves.

If you were pursuing a specific, well-documented fine-tuning approach, and had some specific feature need to support that, that could be a legitimate feature-request. (It's unlikely the original .intersect_word2vec_format() would be just right for any fine-tuning approach.) But we'd need a clearer, implementable description of what was needed.