pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.5k stars 1.04k forks source link

Translations of xarray core project website #9094

Open dcherian opened 3 weeks ago

dcherian commented 3 weeks ago

What is your issue?

From @steppi in https://github.com/xarray-contrib/xarray.dev/issues/673

Hi xarray team,

With support from the CZI Scientific Python Community and Communications Infrastructure grant, Quansight Labs is able to offer assistance with developing and publishing translations of the brochure websites for the Scientific Python Core Project. You may have seen that translations into some languages are already available for numpy.org, with a version switcher in the top right corner; we're offering to help core projects integrate something similar into their websites. Our aim is to accomplish this in a way that requires minimal effort from the core project maintainers.

I've been tasked with setting up the software infrastructure for translations, and am posting issues today to ask core project teams if they would like to participate. I've published an FAQ here: scientific-python-translations.github.io/faq, with more information. Please take a look and let me know if you have any questions. If you decide to participate, I can begin setting up the translation infrastructure, which will require no work or input from maintainer teams (See here in the FAQ for more details on the process).

steppi commented 3 weeks ago

Thanks @dcherian for moving this to the right place! I'd be happy to answer any questions anyone has.

max-sixty commented 3 weeks ago

Thanks!

A couple of (open-minded :) ) questions:

steppi commented 3 weeks ago

Thanks for the questions @max-sixty! Yes, the plan is just to have the front page translated not the full docs. Regarding, keeping things updated, in the FAQ, I wrote

Limiting the scope of content to translate for each project to a core set of relatively static things which a competent volunteer can translate for one language in one to two days will help in making it manageable to keep translations up-to-date. Crowdin integrations discussed earlier also help in the efforts to keep translations up-to-date. Crowdin will become aware of any changes in the project website content and notify translators that there are new strings available to translate. The key factor however, will be in fostering robust communities of volunteer translators willing to put in the work to maintain translations, and if strong enough communities develop, it may even become feasible to expand the scope of content to translate.

but I think something important to keep in mind that I left out is that the grant funding is for the initial work setting up the infrastructure and helping to organize a community of volunteer translators. Since the translators are already volunteers, we won't have to worry about them disappearing when the funding runs out. We may try to secure additional funding for organizing and coordination work, in which case, it may be possible to expand the scope of content to translate, but even without additional funding, the organizing work needed to help keep the existing translations up to date should be small enough that it could continued on a volunteer basis.

  • Are they substantially better than an LLM translation? I'd imagine LLMs are a bit worse but they can be kept up to date easily.

In the FAQ I wrote

Machine translations can still struggle with the technical language and jargon of scientific computing. Examples include machine translation having difficulty with terms which are commonly referred to by their English names in other languages, such as Git jargon like “branch” and “commit”; and machine translation producing incorrect literal translations of technical phrases. Translators who worked on the numpy.org project have attested to these challenges. While machine translations provide a good starting point for translators, for content of this nature, it is valuable to have a human in the loop to ensure accuracy and consistent quality. In addition, publishing official translations can help make new non-English speakers feel more welcome, increasing their likelihood of participating in user communities.

and I think this still holds for LLM translations, unless perhaps one takes care in ones prompt engineering to work around these issues. In that case, I think one would still want to review the translations to verify the details. The numpy.org translators worked with and improved machine translations in a similar way through the Crowdin translation management platform. Having a human in the loop to review and improve machine translations seems like the best option, and there doesn't seem to be a shortage of potential volunteers who would be interested in this work for the small scale we propose.

max-sixty commented 3 weeks ago

Thanks @steppi ! That sounds good. (I realize you added a link to the FAQ, sorry for having missed that...)

steppi commented 3 weeks ago

Thanks @steppi ! That sounds good. (I realize you added a link to the FAQ, sorry for having missed that...)

Great to hear! No worries about missing the FAQ link; sorry for burying it in a wall of text :)