openlanguagedata / seed

Seed Machine Translation Data
30 stars 2 forks source link

How language-specific statements should be translated? #2

Closed avidale closed 7 months ago

avidale commented 8 months ago

I am currently piloting translation NLLB-Seed to Russian, as a pivot language for translating it to lower-resourced languages of Russia and maybe other post-Soviet countries.

One of the issues that we encountered on early stages is the sentences that describe the concepts related specifically to English language. Example:

Names for the number 0 in English include zero, nought (UK), naught (US; ), nil, or—in contexts where at least one adjacent digit distinguishes it from the letter "O"—oh or o.

It seems to me that English words which are used not only to represent the meaning of the sentence but also to constitute its theme should not be translated. At least, this seems the most straightforward way to preserve the semantics and pragmatics of such sentences.

The translations of the example above would then look like

[Russian] Названия для числа 0 в английском языке включают zero, nought (UK), naught (US), nil, и, в контекстах, где хотя бы одна соседняя цифра отличает их от буквы "о", oh или o. [French] Les noms pour le nombre 0 en anglais incluent zero, nought (UK), taught (US), nil, et, dans les contextes où au moins un chiffre adjacent les distingue de la lettre "o", oh ou o.

Is my intuition about this problem correct? Or there is another recommended approach to deal with such sentences?

Regardless of the answer, it might make sense to include the recommendation for translating such sentences into the OLDI translation guidelines.

jeanm commented 8 months ago

This is a reasonable approach. Thanks for pointing this out, we'll make sure to include it in the guidelines!

jeanm commented 7 months ago

A line was added to the guidelines about this so I'm going to close the issue. Please feel free to reopen if you think it's still not clear.