oborchers / Fast_Sentence_Embeddings

Compute Sentence Embeddings Fast!
GNU General Public License v3.0
619 stars 83 forks source link

issue with fasttext model #10

Closed peter-pogorelov closed 5 years ago

peter-pogorelov commented 5 years ago

The following code throws an error (TypeError: Cannot convert numpy.float32 to numpy.ndarray):

fb = load_facebook_model(path_to_model) model = SIF(fb, alpha=1e-7, components=1) model.train([IndexedSentence(s, i) for i, s in enumerate(sentences)]) this line >> model.sv.similar_by_sentence(['документы', 'бухгалтерия'], model=model, indexable=sentences)

However, if we replace the model with vectors, everything seems alright.

ft = KeyedVectors.load_word2vec_format(path_to_vectors) model = SIF(ft, alpha=1e-7, components=1) model.train([IndexedSentence(s, i) for i, s in enumerate(sentences)]) model.sv.similar_by_sentence(['документы', 'бухгалтерия'], model=model, indexable=sentences)

This problem is really important since word counts (ft.wv.vocab) from vectors look like they were automatically recovered from vectors using cosine similarity (not sure about that) and they are not the same as from the model.

oborchers commented 5 years ago

Hi, thank you for the issue. I was already contacted and the issue should now be resolved.

Make sure to upgrade to the latest version by pip install -U fse or by building from the master branch, as I've just released 0.1.15.

If the issue persists, please feel free to contact me again.