sillsdev / serval

A REST API for natural language processing services
MIT License
4 stars 0 forks source link

Mix multiple sources and one target #309

Closed johnml1135 closed 7 months ago

johnml1135 commented 9 months ago

Spin off of https://github.com/sillsdev/serval/issues/266 for Serval implemenatation.

Mixing multiple sources from different NLLB-200 languages has shown to make a big bump especially if the backtranslation language is different than the source text (say English backtranslation with Spanish source text). Including target sentences in multiple times makes the behavior worse (memorizing the target sentences), but interweaving half the target with 2 different sources works pretty well. How can this be implemented:

Proposal:

johnml1135 commented 9 months ago

So, 50% random mix, if one source has no text, use the other source. Only 2 sources needed. Follow SILNLP implementation.

johnml1135 commented 9 months ago

Proposal: for implementation, when posting the single corpora:

johnml1135 commented 9 months ago

@ddaspit - here is a proposal of the changes:

ddaspit commented 8 months ago

We need API documentation for this feature.