Open nikisix opened 3 years ago
Hi @nikisix ! It looks like JW300 which we used as source for other languages does not include Pulaar. On the OPUS website you can look for other corpora: https://opus.nlpl.eu/ -- It lists CCAligned, Wikimedia, Ubuntu, QED for Fula, but I'm not sure if it's Pulaar. The CCAligned corpus was previously found (https://arxiv.org/abs/2103.12028) to contain mostly noise for Fula, so I would not recommend using it. Perhaps Wikimedia, Ubuntu or QED? These might be quite domain-specific though.
Haven't used those last sources you mention before. I did notice JW300 has code 'fub' for pular defined, but no supporting data files unfortunately.
Hi I would like to contribute a Pulaar translator model, but need pointed to the the sentence pairs. Can anyone help me out?