masakhane-io / masakhane-mt

Machine Translation for Africa
MIT License
277 stars 206 forks source link

Fula Pulaar <-> English Resource (Sentence Pairs) #146

Open nikisix opened 3 years ago

nikisix commented 3 years ago

Hi I would like to contribute a Pulaar translator model, but need pointed to the the sentence pairs. Can anyone help me out?

juliakreutzer commented 3 years ago

Hi @nikisix ! It looks like JW300 which we used as source for other languages does not include Pulaar. On the OPUS website you can look for other corpora: https://opus.nlpl.eu/ -- It lists CCAligned, Wikimedia, Ubuntu, QED for Fula, but I'm not sure if it's Pulaar. The CCAligned corpus was previously found (https://arxiv.org/abs/2103.12028) to contain mostly noise for Fula, so I would not recommend using it. Perhaps Wikimedia, Ubuntu or QED? These might be quite domain-specific though.

nikisix commented 3 years ago

Haven't used those last sources you mention before. I did notice JW300 has code 'fub' for pular defined, but no supporting data files unfortunately.