ishefi / semantle-he

A Hebrew version of Semantle.
Other
53 stars 19 forks source link

Unable to reproduce model #63

Closed Man-with-Arrow closed 1 year ago

Man-with-Arrow commented 1 year ago

Hi,

I've been playing around with Word2Vec and the model linked here, and I can't seem to reproduce the same distances.

For example:

Python 3.11.2 (main, Feb 12 2023, 00:48:52) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gensim
>>> model = gensim.models.Word2Vec.load('./wiki_tokenized_model/model.mdl')
>>> model.wv.similar_by_word('אשליה')
[('אשליית', 0.7949888110160828), ('אשלייתי', 0.7358855605125427), ('תחושה', 0.7196317911148071), ('סימולקרה', 0.7147767543792725), ('מתעתעת', 0.7013854384422302), ('השתקפות', 0.6864952445030212), ('אסטרלית', 0.6836147308349609), ('אשלייתית', 0.6831943392753601), ('אילוזיה', 0.6829365491867065), ('סיראנית', 0.6813762784004211)]

Note the distances.

However, the distance Semantle gives is different:

Screenshot from 2023-03-07 08-26-36

Am I doing anything wrong? I'd love some feedback!

ishefi commented 1 year ago

The linked model was trained using the same data, but with different parameters. We do not share Semantle's model