r-devel / r-project-sprint-2023

Material for the R project sprint
https://contributor.r-project.org/r-project-sprint-2023/
17 stars 3 forks source link

Support for internationalization of help pages #35

Open hturner opened 1 year ago

hturner commented 1 year ago

Discussed in https://github.com/r-devel/r-project-sprint-2023/discussions/19

Originally posted by **jeroen** August 11, 2023 At rOpenSci there is interest to provide translations for package help pages for some of our packages, initially for Spanish speaking users. Currently R has support for translating messages, but to the best of my knowledge, not for help pages. We are interested to know if prior work has been done in this area, and where it stalled, and if this is something that maybe we can contribute. Ideally, the package author could write man pages such a way that the user experience would be the same as for messages: when a user does `?topic` a manual page in the local language would be opened, and if not available, fall back on the English manual page. One way would be to allow for e.g. a `man-es` directory in the source package (in addition to the regular `man` directory), with Spanish versions of the manuals. Though a drawback may be that this implies a full copy of the manual pages, which might complicated R CMD check. Alternatively maybe the `Rd` format can be extended to allow for particular multilingual sections. @eliocamp mentioned: > It'd be interesting to talk to r core about it. We've exchanged emails with Hadley about this some years ago and it lead to this proof of concept: https://github.com/eliocamp/translated > It uses some weird feature of help that allows you to type es?function to get to an .Rd with a function-es alias. I don't think this is the indented use, but it's what we currently have. But before running with it it might be good to talk with r core. @maelle @yabellini
mmaechler commented 1 year ago

Some caveats, although late, I'm sorry: Within R core we had had discussions about this, not at all recently, though. IIRC, a main reason we thought the idea was difficult and could even be counter-productive: We consider the union of help pages for a package to be something like "the reference manual" or in other words, "the authorative documentation" (that's why documentation errors are also bugs to be fixed). If this is translated by someone who does not understand the subtle details of this reference and the functionality it describes (which is typically the case, as they are not the package authors), the translation may easily be somewhat vague or misleading but the readers of that translation can hardly notice that.

Many know this experience when buying a technical device and reading the usage instructions in their own language where they are not so useful (sometimes even obviously close to garbage). We then switch to the original language instructions, traditionally very often in English, and find those instructions much better even though they are in a language we speak/read much less fluently.

Secondly, nowadays, with automatic translators, e.g., Google lens, often being of sufficient quality and available to a considerable part of the world population, many are used to just point their smartphones to get the text translated in "no time"... with the advantage to easily flip between translation and original.

All that said should still not defy the project, because there clearly are situations, notably in teaching, where the "plugin"-availability of translated help pages for some crucial R package may help very much to get a class of students to speed on that package's topics.

And I continue to believe it may be most fruitful to think about making this modular, and considering even using, i.e., slightly extending, the R package standard for this, so people could install and load <package>-<language> translations as seeminglessly as regular R packages.

hturner commented 1 year ago

Thanks Martin. Your comment about Google lens reminded me of this comment in a discussion I stumbled across the other day:

I spent hours correcting the translations from Deepl that someone accepted. Had to go through all strings again to ensure that they don’t make AntennaPod look bad when used on the website.

Deepl used literal translations which simply didn’t make sense in German. It used the wrong formal/informal form. It produced sentences that sound awful, even though they might be technically correct. It did not respect the context of sentences (eg breaking grammar in enumerations, making every item a full sentence). It used naming in the documentation that does not match the naming inside the app (eg “Forward” as in email vs “Forward” as in a media player).

A human translator who knows what app they are translating would never have made most of these mistakes. I fear that if we add the machine translations again, people will again start accepting suggestions without thinking about them - they will maybe check whether the translation is valid German, but will not think about whether it makes sense for AntennaPod.

-- https://forum.antennapod.org/t/how-to-enable-machine-translation/2630/6

This was about DeepL which is considered one of the better automatic translators. While translators may not be an expert in the package they are translating, it is likely they are more expert than an automatic translator and therefore less likely to introduce errors in translation.

Given the thousands of potential packages, it clearly makes sense to focus on well-used packages where there is a team of interested translators; so both the strength of the translation team and the community of users could help to maintain high quality (e.g. students may not recognise a misleading translation, but their lecturer is more likely to).

eliocamp commented 1 year ago

A short summary after the first session. This could be a possible infrastructure:

eliocamp commented 1 year ago

A question that didn't come up: what happens if the user installs multiple translation modules for the same package and language?

mmaechler commented 1 year ago

I think you should typically install only one Brasilian translation for pkg {foo}. But if you have more than one library, you can already now install different versions (or the same one) of a regular package into the different libraries (entries of .libPaths()), and the same would almost surely happen with translation modules.

lawremi commented 1 year ago

We could disambiguate like we do for duplicate aliases, so some sort of menu.

yabellini commented 1 year ago

How great to see this project being discussed!!. Thanks so much, @hturner for the tag.

For the rOpenSci multilingual publishing project (and based on the experience of several other translations within the R community), we use an automatic translator (DeepL) to generate a first automatic translation that two bilingual people then review. These people are also developers, so they have the technical background and knowledge of the discipline's jargon.

I agree with the comments of AntennaPod: You can't rely on machine translation alone, but you can use it to reduce the time it takes to produce a quality translation. This way, you automate one task, and reviewers will focus on linguistic tasks such as using inclusive language, localization of examples, correctly translating phrases, metaphors, or analogies, or looking for reference material in the language of the translation.

It is also necessary to agree on how and what to translate (some technical terms remains in English) to maintain quality and simultaneously allow several people to participate. It is also essential to generate a common glossary and roles. We called Translations guidelines.

We know that code peer review ensures the quality of our software; with the same goal of ensuring the quality of our content in different languages, we implement a system to review and maintain our localizations, and by using roles (such as reviewers and maintainers) and tools (such as GitHub and Pull Request) known to our community, we make this process easier to understand, apply and, therefore, easier to contribute to. Here is an example: https://github.com/ropensci/dev_guide/pull/569

In summary: you can use automatic translation if it will be reviewed by humans with a clear set of rules.

eliocamp commented 1 year ago

I've created a repo with a summary of the discussions. Please do read it and feel free to open issues if you think something important is missing.

eliocamp commented 10 months ago

As an update. We've submitted a proposal for an R Consortium Grant to implement this.