Open dcherian opened 3 weeks ago
Thanks @dcherian for moving this to the right place! I'd be happy to answer any questions anyone has.
Thanks!
A couple of (open-minded :) ) questions:
Thanks for the questions @max-sixty! Yes, the plan is just to have the front page translated not the full docs. Regarding, keeping things updated, in the FAQ, I wrote
Limiting the scope of content to translate for each project to a core set of relatively static things which a competent volunteer can translate for one language in one to two days will help in making it manageable to keep translations up-to-date. Crowdin integrations discussed earlier also help in the efforts to keep translations up-to-date. Crowdin will become aware of any changes in the project website content and notify translators that there are new strings available to translate. The key factor however, will be in fostering robust communities of volunteer translators willing to put in the work to maintain translations, and if strong enough communities develop, it may even become feasible to expand the scope of content to translate.
but I think something important to keep in mind that I left out is that the grant funding is for the initial work setting up the infrastructure and helping to organize a community of volunteer translators. Since the translators are already volunteers, we won't have to worry about them disappearing when the funding runs out. We may try to secure additional funding for organizing and coordination work, in which case, it may be possible to expand the scope of content to translate, but even without additional funding, the organizing work needed to help keep the existing translations up to date should be small enough that it could continued on a volunteer basis.
- Are they substantially better than an LLM translation? I'd imagine LLMs are a bit worse but they can be kept up to date easily.
In the FAQ I wrote
Machine translations can still struggle with the technical language and jargon of scientific computing. Examples include machine translation having difficulty with terms which are commonly referred to by their English names in other languages, such as Git jargon like “branch” and “commit”; and machine translation producing incorrect literal translations of technical phrases. Translators who worked on the numpy.org project have attested to these challenges. While machine translations provide a good starting point for translators, for content of this nature, it is valuable to have a human in the loop to ensure accuracy and consistent quality. In addition, publishing official translations can help make new non-English speakers feel more welcome, increasing their likelihood of participating in user communities.
and I think this still holds for LLM translations, unless perhaps one takes care in ones prompt engineering to work around these issues. In that case, I think one would still want to review the translations to verify the details. The numpy.org translators worked with and improved machine translations in a similar way through the Crowdin translation management platform. Having a human in the loop to review and improve machine translations seems like the best option, and there doesn't seem to be a shortage of potential volunteers who would be interested in this work for the small scale we propose.
Thanks @steppi ! That sounds good. (I realize you added a link to the FAQ, sorry for having missed that...)
Thanks @steppi ! That sounds good. (I realize you added a link to the FAQ, sorry for having missed that...)
Great to hear! No worries about missing the FAQ link; sorry for burying it in a wall of text :)
What is your issue?
From @steppi in https://github.com/xarray-contrib/xarray.dev/issues/673