twadada / multilingual-nlm

Code for "Unsupervised Multilingual Word Embedding with Limited Resources using Neural Language Models" and "Learning Contextualised Cross-lingual Word Embeddings and Alignments for Extremely Low-Resource Languages Using Parallel Corpora"
30 stars 5 forks source link

Where to find the dataset? #1

Closed bdqnghi closed 2 years ago

bdqnghi commented 3 years ago

Hi, I wonder where can I find the dataset as presented in the description ("train.fr", "train.de" and "train.en")? Thanks

twadada commented 3 years ago

Hi, you can use arbitrary monolingual data as inputs.