openlm-research / open_llama

OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B trained on the RedPajama dataset
Apache License 2.0
7.29k stars 372 forks source link

Corpora for Ukrainian #5

Closed egorsmkv closed 7 months ago

egorsmkv commented 1 year ago

Hello.

There is a corpora call Ubercorpus for Ukrainian you can add to the project: https://lang.org.ua/en/corpora/#anchor4

In a few days will be UNLP, an event from Ukrianian NLP community and there will be presented the second version of the corpus with larger size.

podarok commented 1 year ago

Would be nice to have Ukrainian Ubercorpus to OpenLLama https://huggingface.co/openlm-research/open_llama_7b_preview_200bt/discussions/1