fastText interactions with morphological and orthographic profile

viking-sudo-rm / voynich2vec

Applying word2vec embeddings to the problem of deciphering the Voynich manuscript.

7 stars 0 forks source link

fastText interactions with morphological and orthographic profile #16

Open chirila opened 5 years ago

chirila commented 5 years ago

For small datasets, a language's morphological profile (how much, what it marks) might interact with nearest neighbor word embedding profiles, especially when corpora are small. Test this on a corpus of similar size to Voynich for different languages:

Latin
Greek
French
Italian
Navajo
Mandarin
Arabic
Turkish
German

Do the same with de-voweled corpora of the same languages.