ros2 / ros2_documentation

ROS 2 docs repository
https://docs.ros.org/en/rolling
Creative Commons Attribution 4.0 International
546 stars 1.06k forks source link

Multi-language support and management #3678

Open cychitivav opened 1 year ago

cychitivav commented 1 year ago

Hello,

I am a member of the Spanish ROS community, and I have been reading issue #3249. I believe that translating into multiple languages is crucial because, at one point, I was one of those individuals who struggled to learn ROS due to the language barrier in the documentation.

Translating this is a best-effort task to help people who prefer to read in their language or are unable (or find it difficult) to read in English. A translated page is beneficial, but if it's not available, the page will be in English.

Apart from the steep learning curve of ROS, the fact that the documentation is only available in English makes the learning curve even steeper. Therefore, I think it's essential to have the documentation available in multiple languages to enable more people to learn ROS.

Creating separate repositories for each language seems impractical since every update in the original repository creates conflicts when merging into individual language repositories. While exploring Sphinx's documentation on internationalization (https://www.sphinx-doc.org/en/master/usage/advanced/intl.html), I found the following approach:

image

Process

This process can be somewhat simplified with the help of the sphinx-intl library:

  1. Upload .rst files with the documentation in English.
  2. Generate a .pot file from the .rst files using make gettext (these files would be stored in the build folder).
  3. Update the .po files with the .pot files using sphinx-intl update -p build/locale -l <language>.
  4. Translate the .po files manually or with the assistance of a translator.
  5. Build the documentation in the desired language using make -e SPHINXOPTS="-D language='<language>'" html (the output would be stored in the build/<language> folder).

Note: When compiling HTML with make html, a .mo file is generated, which contains the translation from the .po file, and the HTML page is generated with the translation.

Structure of .po files

These files allow the translation of phrases or short paragraphs, where people with basic knowledge of English can perform the translation. Additionally, the authorship of each translation can be maintained, and changes made in each language can be tracked without affecting the English documentation (.rst files).

As described in the GNU documentation:

A PO file is made up of many entries, each entry holding the relation between an original untranslated string and its corresponding translation. All entries in a given PO file usually pertain to a single project, and all translations are expressed in a single target language. One PO file entry has the following schematic structure:

white-space
#  translator-comments
#. extracted-comments
#: reference…
#, flag…
msgid untranslated-string
msgstr translated-> 

A simple entry can look like this:


#: lib/error.c:116
msgid "Unknown system error"
msgstr "Error desconocido del sistema"

In essence, the .po files are a list of entries, each having a msgid and a msgstr. The msgid represents the original English text, and the msgstr contains the translation.

Management of .po files

I have been working on a fork of this repository (https://github.com/cychitivav/ros2_documentation/tree/multilingual) to automate the generation of .po files using GitHub Actions. With some changes in the source folder, it is possible to generate .po files for multiple languages simultaneously.

Furthermore, the action includes code to extract the current status of the .po files, providing information on the number of translated and untranslated msgid entries. With this information, an issue could be automatically generated, or translations could be performed using googletrans (https://pypi.org/project/googletrans/) or similar tools.

Modifications to msgid

An important aspect is to identify which files require translation and ensure that existing translations are not lost. To address this, I have explored the sphinx-intl module, which allows updating the .po files based on the .pot files generated by make gettext. During the update, the following scenarios can occur:

Handling for each language

I have made some changes to the makefile so that files for multiple languages can be generated simultaneously (according to the interested translation communities). Additionally, I have placed the locale folder in the root of the repository to avoid conflicts in the action workflow.

The folder structure is as follows:

.
├── build
│   ├── gettext
│   └── html
│       ├── en
│       ├── es
│       └── fr
├── locale
│   ├── es
│   └── fr
└── source

As you can see, the locale folder contains the .po files, and the source folder remains unchanged to prevent any loops in the action or damage to the English documentation. As suggested by @fujitatomoya:

but if we take multiple language support in this repo, i would request the following architecture dependency.

  • mainline doc WILL NOT depend on any multiple language contents.
  • Only multiple language contents can refer to mainline doc.

Feasibility

Given the change tracking performed by make gettext and sphinx-intl in the .rst files, I believe it is possible to maintain the documentation in multiple languages within a single repository once a significant portion of the documentation has been translated. This would even allow for automatic translation and community contributions to improve the translations through PRs.

This is because if a large portion of the files is translated (either manually or automatically),

minor changes in the .rst files can be handled by temporarily preserving the previous translation until it is updated (or published, if desired).

Final Comments

First and foremost, I would like to hear the maintainer's opinion on adding the .po files and the locale folder to the main repository. This would involve reviewing each pull request by a moderator for each language or implementing a similar process. By doing so, the authorship of each translation can be maintained, and changes made in each language can be controlled without affecting the English documentation (.rst files).

If you believe this is possible, I would like to submit a PR to the repository and await a review. I can also provide further clarification on the entire process.

It would be interesting to have a section like this on the ROS page:

image https://docs.readthedocs.io/en/stable/localization.html

Considerations

clalancette commented 1 year ago

First, I'm sorry for the very long delay in responding.

Second, I agree with you that for the most part, it would be much better if the translations lived in the same repository as the original English documentation. Otherwise, it is going to get out of sync quickly and be hard to keep up-to-date.

However, I do have concerns that it will be hard for the current maintainers of ros2_documentation to be able to review the translations for many different languages. If we do go this route, then effectively we will merge in changes to the .po files without understanding the language they are being translated into. There is some possibility for abuse there (like putting spam in the translation), but I hope that wouldn't be a problem. Overall, I think we should do it regardless of these problems, but in the long-term I think we would want to have trusted reviewers for each language.

With all of that said, I would love to see a PR that has the changes to make this happen. We can discuss more about what this would look like there. Note that the way we build official documentation does not use a GitHub action, but instead invokes the Dockefile at https://github.com/ros2/ros2_documentation/blob/rolling/docker/image/Dockerfile, so any solution will need to be integrated there.

cychitivav commented 1 year ago

Hi,

Thank you very much for your feedback.

Of course, I have created the Pull Request (#3829). For now, I'll keep it as a draft while I make some changes, as the version I had is outdated. Regarding GitHub Actions, I only use it to update the PO files, and the only change in Docker is adding the sphinx-intl package to the requirements.txt file.

fujitatomoya commented 1 year ago

thank you very much detailed explanation!

i would like to ask a couple of questions,

On the other hand, if the change is significant, the translation in the .po file will be removed,

this sounds that mainline doc easily breaks multiple language docs? this works for mainline doc maintainers, but can be problems for multiple language doc maintainers?

i am not so familiar with sphinx multiple language support, so maybe i am mistaken for some parts...

cychitivav commented 1 year ago

Hi @fujitatomoya,

I had forgotten to mention that these .po files are not only for Sphinx; they are actually used in various internationalization cases as they separate the code or formatting from the translation. I bring this up because these files can be managed using several tools, and one of them is the GNU gettext. Using this tool, you can obtain statistics on how many missing msgid need translation. While I'm not an expert with these files, I believe that identification isn't a problem. That's why I've left the PR as a draft for now and it is necessary to check it properly.

Regarding synchronization, I've been working on a GitHub Actions workflow to perform updates and generate a report every time a commit is made in the source folder. However, I think it's not ideal and warrants discussion.

Lastly, when a msgstr is removed, it's usually because the original text has changed significantly, and it's highly likely that the previous translation wouldn't be accurate. If the concern is about minor changes with typographical errors or punctuation, a 'fuzzy' flag is activated for this, allowing a decision between keeping the original text or the previous translation. If a text lacks translation, it would be displayed in English, ensuring the documentation is always up to date. Nonetheless, the issue lies in potentially having pages with mixed languages.

I hope this solves your concerns and don't forget to mention something.

pxalcantara commented 6 months ago

hi @cychitivav I strongly agree with your points about the importance of having multi-language tutorials support. There is any update subject?