For small datasets, a language's morphological profile (how much, what it marks) might interact with nearest neighbor word embedding profiles, especially when corpora are small.
Test this on a corpus of similar size to Voynich for different languages:
Latin
Greek
French
Italian
Navajo
Mandarin
Arabic
Turkish
German
Do the same with de-voweled corpora of the same languages.
For small datasets, a language's morphological profile (how much, what it marks) might interact with nearest neighbor word embedding profiles, especially when corpora are small. Test this on a corpus of similar size to Voynich for different languages:
Do the same with de-voweled corpora of the same languages.