matplotlib / mpl-brochure-site

Source for the top-level landing page.
Other
3 stars 13 forks source link

Translation of matplotlib core project website. #93

Open steppi opened 1 month ago

steppi commented 1 month ago

Hi Matplotlib team,

With support from the CZI Scientific Python Community and Communications Infrastructure grant, Quansight Labs is able to offer assistance with developing and publishing translations of the brochure websites for the Scientific Python Core Project. You may have seen that translations into some languages are already available for https://numpy.org/, with a version switcher in the top right corner; we're offering to help core projects integrate something similar into their websites. Our aim is to accomplish this in a way that requires minimal effort from the core project maintainers.

I've been tasked with setting up the software infrastructure for translations, and am posting issues today to ask core project teams if they would like to participate. I've published an FAQ here: https://scientific-python-translations.github.io/faq/, with more information. Please take a look and let me know if you have any questions. If you decide to participate, I can begin setting up the translation infrastructure, which will require no work or input from maintainer teams (See here in the FAQ for more details on the process).

I had briefly discussed this translation project with the team on gitter last year. At that point there was some uncertainty because I did not know how to set up the translation infrastructure for Matplotlib's website, but in the since then I've become more experienced with Crowdin, the translation management platform we are using. Setting things up shouldn't be an issue any longer.

story645 commented 1 month ago

Hi, transferred this to the brochure site repo because that's where the work would be done. My major question is can we exclude content - I understand translating release announcements, but wonder at the maintenance cost of translating something like a "hey, here's our GSOC person" - especially when the accompanying post with details isn't translated.

steppi commented 1 month ago

Thanks @story645 for redirecting this to the right place!

Hi, transferred this to the brochure site repo because that's where the work would be done. My major question is can we exclude content - I understand translating release announcements, but wonder at the maintenance cost of translating something like a "hey, here's our GSOC person" - especially when the accompanying post with details isn't translated.

Good question. Yes, it's possible to get fine grained control over what should and shouldn't be translated. Depending on the extent the Matplotlib team would like to be involved, you could give a general idea of sort of things that should be excluded and we can use our judgment to decide on the details which you could sign off on, or you can give more precise instructions.

story645 commented 1 month ago

Hi @steppi, so the biggest concern/hesitancy is on the cost/benefit of doing these translations.

Can we get more info on who the audience is for these translations, given that the rest of the documentation is in English and machine translations of technical documents are often somewhat passable. Is this aimed at decision makers - where it's very important that the translations are high quality and therefore crowdins AI pitch is worrying - or at regular users , who are already probably running the homepage through google translate, and would likely get more use out of the cheatsheets getting translated? And are there stats on the benefits of providing the homepage in a different language?

Where I'm coming from is that it's hard not to project the maintainers own experiences, where everyone for whom English isn't their primary tongue still heavily works in English. Which also leads me to have some worries about the politics around which languages we are choosing to translate into (where we skew heavily Eurocentric if we only choose languages maintainers know.)

Also have a couple of more long term questions:

  1. who is responsible for maintaining and verifying translations when the funding ends?
  2. how will it work w/ our existing infrastructure and can it be easily pulled out?
    1. slight tangent maybe but thinking a bit about the fan translation communities for manga and anime (and similar from other countries) and how they all operate on a sorta template + script infrastructure. Which how are translation notes handled? (Or do we just rework the copy b/c it shouldn't be that complicated in the first place?)
steppi commented 1 month ago

Hi @story645. I appreciate the thought and care you've put into thinking through whether it makes sense for matplotlib to participate. These are all very good questions. I will try to answer to the best of my ability but am also pinging @melissawm and @rgommers, who may be able to provide more information. My role in this project is primarily on the infrastructure side, and I am mostly relaying what I have picked up while working on that side of things.

Can we get more info on who the audience is for these translations

My understanding is that the primary audience for these translations will be potential new users from countries where levels of English proficiency tend to be low. So one should imagine beginners for whom accurate knowledge of what the core projects do and how they are useful will be beneficial in helping them understand whether they should invest time in learning them. One could imagine an active member of the X-speaking user community sending a link to a translated brochure website in order to encourage someone to start using the project. I see it as a tool that could help the existing users strengthen their communities. That the translations are official and published by the projects themselves can give an impression that the project actually has these communities in mind, and they are not "on their own", so to speak. I have had it related to me by @rgommers that such gestures have lead to X-speaking users clustering around certain projects.

Which also leads me to have some worries about the politics around which languages we are choosing to translate into (where we skew heavily Eurocentric if we only choose languages maintainers know.)

We deliberately hope to seek out languages with large communities with no or low English proficiency. I think this report from EducationFirst can help provide a rough understanding of which languages may be most useful. At the moment, numpy.org is currently translated into Portuguese and Japanese, and one sees from the report that Brazil and Japan are both listed has having low proficiency.

  1. who is responsible for maintaining and verifying translations when the funding ends?

The funding is primarily for setting up the infrastructure and starting the organization of translation communities. There is a large fixed effort needed to get things up and running, but once things are in place, we believe that we can continue to maintain things as volunteers, or find other volunteers to take our place. We may also seek out additional small sources of funding going forward. We certainly do not want this work to fall back to project maintainers when the CZI funding runs out. This is meant to remain a broader Scientific Python initiative, and we will be working on a SPEC for how that would work.

2. how will it work w/ our existing infrastructure and can it be easily pulled out?

I will do my best as engineer to try to ensure that I set things up in such a way that they can easily be pulled out. I know your existing infrastructure is more complex than that of some of the other projects, and if I'm somehow unable to do such a thing, I think it's perfectly understandable that your team would not want to participate. I'm currently unable to experiment with how things will work unless I have permission to set up your project on Crowdin. Doing so would entail no commitment to participate.

2. slight tangent maybe but thinking a bit about the fan translation communities for manga and anime (and similar from other countries) and how they all operate on a sorta template + script infrastructure. Which how are translation notes handled? (Or do we just rework the copy b/c it shouldn't be that complicated in the first place?)

I think Crowdin provides similar infrastructure. Here's an example of what the UI looks like for untranslated content. The text is segmented into strings which can be translated one by one, and suggestions are provided based on machine translations. Crowdin can be synced to a GitHub repo, and will open a running PR which adds the translations to the documentation hosted there.

image

Let me know if you have any other questions. If you're still skeptical, you can forgo deciding for a while, waiting until you see the results for other projects.

story645 commented 1 month ago

So one should imagine beginners for whom accurate knowledge of what the core projects do and how they are useful will be beneficial in helping them understand whether they should invest time in learning them.

This is making me wonder if we should pilot the translation tooling on cheatsheets, given the content is inline w/ what I see folks sharing online in other languages and we've seen demand for it https://github.com/matplotlib/cheatsheets/pull/132 Sorta thinking this out, just as PyLadies Brasil did a translation, I could see other groups maybe making a project of translating the cheat sheets to use em in practice in a way where I don't know if they'd get the same utility out of translating the homepage, given it is essentially marketing copy. Just like the homepage, there isn't much in the way of actual text on the cheatsheets. And we could always advertise on our homepage that -> hey we have these cheatsheets that are essentially an overview of our features in all these languages.

ETA: and if we go cheatsheets, we'd probably first need to refactor so that the text is fed in via script/the code blocks don't need to be recopied for every version.

melissawm commented 1 month ago

@story645 I am happy to work with you in prioritizing the cheatsheets if that seems like the right approach. @steppi did a good job explaining the main motivators above, and I also want to mention that as a non-native english speaker, there is also an added aspect of inclusion with finding websites translated to your own language, which I also think is important. Machine translation is not quite there yet, and I also see this as a point of entry for new contributors who can join the project as translators and later move into different roles if they are interested. Happy to chat more about a strategy that works for you folks! 😄

steppi commented 1 month ago

One thing that came out of discussions at the summit last week is that while we can have the content of the cheatsheets translated, it may be necessary to make adjustments to the graphic design in order to fit everything onto the sheet in a harmonious manner. For example, text may overflow the boxes and sections may end up occupying a different amount of space.

image

story645 commented 1 month ago

there is also an added aspect of inclusion with finding websites translated to your own language, which I also think is important

I 100% get that, but could folks also get that out of the cheat sheets if we have a clear "hey, we have this resource in multiple languages" signpost on our home page? Granted, I asked about translations at this week's new contributor meeting & there was support for translating the homepage but there was also confusion on where the translations stopped.

Machine translation is not quite there yet

So yeah for my own experiment I translated the homepage into Russian & it's like "accurate(ish) but not how a human would translate or write any of this" so like I know first-hand to the painfulness that is machine translation. And is actually also why I was thinking cheat sheets - b/c it's a resource we intend folks to go back to, so machine translations would start grating in a way a read one resource like the home page is different. Dunno, was thinking of a student who used to translate my slides.

also see this as a point of entry for new contributors who can join the project as translators

Totally, and I think they could do that regardless of where we launch the translations...b/c the plan is that this is a pilot and we'd broaden out to other parts.

story645 commented 1 month ago

may be necessary to make adjustments to the graphic design in order to fit everything onto the sheet in a harmonious manner.

that's fair/makes sense/if we go that route we can ask @rougier if he has thoughts on a more flexible layout.

rougier commented 3 weeks ago

For the cheatsheet this might relatively easy since most text relates directly to matplotlib function/parameters name and they cannot be translated (do they ?). For example, on the screenshot above, only the section labels and "how do I..." content would need translation I think. Best would be to have a first translation and see how it fits.

story645 commented 3 weeks ago

Best would be to have a first translation and see how it fits.

Portuguese (Pyladies Brazil): https://github.com/pyladies-brazil/matplotlib-dicas

Was talking to @QuLogic and @ksunden on this week's call about this, and one rough idea was if we could transition to something more web native that could still spit out nice printed sheets (since primary motivation is still something folks can hand out at a tutorial or print out and put by their desk). Or something that's closer to our other docs infrastructure (so sphinx) so that something built for these can be easier to translate to our other docs/this doesn't have to be a very special one off.

rougier commented 3 weeks ago

Looks nice, only one page has an overflow at the bottom right. I agree a web native design would make things easier to tweak.