maaario / nlp-slovak-universal-language-model

MIT License
1 stars 0 forks source link

Get available slovak corpora / language models #6

Open maaario opened 5 years ago

maaario commented 5 years ago

Models LM from 2013 https://korpus.sk/prim(2d)6(2e)1(2f)models.html

Corpora (+ train / evaluate our models) https://korpus.sk/res.htm (marek doubts we can get full access) https://www.sketchengine.eu/user-guide/user-manual/corpora/by-language/slovak-text-corpora/ (30-days free trial, only some datasets)

Alternatively, we can make Ivor:

  1. train simple LM (e. g. KenLM) on corpus
  2. evaluate perplexity of our models on corpus

Useless: Search for some on the internet - - No Data / models public. Slovak Language Model from Internet Text Data, 2010 -> no code/model/data found Slovak n-grams @ fasttext ... word vectors might be useless

maaario commented 5 years ago

Next steps: