Download hebrew dataset from wikipedia
hewiki-latest-pages-articles.xml.bz2
In linux this can be easily done using:
wget https://dumps.wikimedia.org/hewiki/latest/hewiki-latest-pages-articles.xml.bz2
pip install --upgrade gensim
(https://radimrehurek.com/gensim/install.html)
Run create_corpus.py: python create_corpus.py
wiki.he.text
train the model: from python prompt:
explore model using jupyter notebook. You can use the supplied playingWithHebModel.ipynb example as a starting point.
pip install fasttext
Testing specific Hebrew analogies like:
פריז + גרמניה - צרפת = ברלין
גבר + מלכה - מלך = אישה