Closed peterbe closed 3 years ago
Roughly 31% of all content is en-US (based on a MySQL query on a July 31 snapshot). I estimate that it's higher amongst the non-archived content; roughly 40%.
We haven't maintained or even tested building anything but en-US. Dealing with archived content hasn't been a priority so we might break when rendering the less
The hardest thing about supporting non-en-US content is because of trust because merging a PR is much more powerful than a <textarea>
web app.
We could write a bot that auto-merges content PRs but this will require much better/tighter integration with linters and CI so that we don't auto-merge a PR that touches "dangerous" things. This is non-trivial work and is best done later rather than a prerequisite.
We need to write some new code to inject a "banner" on archived pages that tells people, who stumble onto the page, that you're viewing archived content. To accomplish this we need:
We don't have any tools to support translators to find out which docs that need translations. The Wiki has/had the Doc_status
dashboard but this does not exist in Yari at all.
Whatever we do, don't ever close the door entirely on some day having all content translated. For example, let's keep the en-US
prefix in all existing document URLs.
We can shut down the Wiki sooner.
All the URLs will continue to work including localized slugs.
Technically, people can still continue to make edits to the localized content my sending PRs on the archived-content git repo. (But it won't get the same CI or the same periodicity to build-to-prod)
If we ever decide to re-instate a locale, we can. Nothing is really lost.
We don't the same kind of localization tools as the Wiki had (Doc_status) and we won't have any time soon.
If we move the localized content from the MySQL Wiki to an archived-content git repo, we can sooner shut down the Wiki.
If we don't need to fully support localized content (aka translated documents) we can get away with keeping the chrome hardcoded in English which makes things much simpler.
We'll drop traffic from Google searches. If we no longer put /sv-SE/docs/Foo/bar
in the sitemap XML files, it's less like Google will present them.
Localizers might still want to attempt to "waste" time making PRs on the archived content and expect speedy reviews.
Since what we intend to do with Yari is that traffic will come in to CloudFront and Lamda@Edge will conditionally look up if it's statically built in S3 or otherwise fall back to Django, we could do the "jamstack thing" for all English documents and keep doing the non-English documents with the existing Kuma. Ie. put English Yari as a mask/layer on top of good old Kuma (plus Wiki) so that English is served from Yari and Japaneese is served from Django.
This approach is the least disruptive but it's not without complications and headache. For example, the localized content thrives on the English document as the "parent" document and Doc_status tooling will eventually break. We also won't be able to connect translated content to the English one and vice versa since they will live in different universes.
The original plan of Yari was that we'd do all languages just like the Wiki. Instead of 11k documents, we'd have to have about 30-40k documents in git. And we were going to make the chrome localized (@fluent/react + Mozilla Pontoon). And instead of the Doc_status dashboards in the Wiki we were going to write some brand new tools to help translators direct their attention to documents that needs to translations or updates.
We also had plans for automation that can quantitatively figure out if a translation is "bad" (ie. compare code examples, non-prose keywords, heading counts, etc). Perhaps we could also write a toolbar or web UI tool so you can pair two documents literally side by side next to each other (one pane in English the other pane a giant <textarea>
).
But one thing we never considered is; who's going to merge and sign off on a git PR when they don't understand what is says? Perhaps the simple workaround would be to just merge it as long as it doesn't appear to introduce external links to sites that look shady. Another "workaround" is to leverage the Mozilla Reps program or Mozilla's own L10n team to drum and develop a chain of trust so that we non-Mozilla-staff contributors we trust to at least review PRs but leave the merging to someone who is staff.
The other harsh truth is that for the past 6+ months we've been developing the Yari prototype by focusing on English. That means that a lot of language-related features haven't gotten its fair share of testing and experimentation and problem-solving. Who knows, will you get out-of-memory errors in CI if you try to build all locales?
There's also the potential of using machine learning (aka. external APIs) to somehow have an external service translate English HTML documents to various languages and then we serve them. This is not without some immediate drawbacks:
All of the above is totally feasible and would even be quite fun to work on. External APIs or UI tools, it's just engineering. Translators who enjoy contributing would be happy. We would maximize our SEO footprint by having all the translations. It would make us not look so "North America global" if we support "every" language. Glory all around!
But, building all of this will take time. Many months. With a reduced development team, assume you have to double the time until we get to a working solution. In the meantime, the Wiki is open. With the doors wide open and nobody employed to police the un-reviewed production edits, the quality of MDN will deteriorate more and more and ruin all the hard work MDN has worked to establish itself as the best source of truth for learning and looking up about web development.
Note-to-self; We need a special banner for non-English content that indicates quite loudly that it's not actively maintained. For example, the "Edit in GitHub" link needs to NOT appear on these pages.
Here's the issue about naming things: https://github.com/mdn/yari/issues/1242
The code that dumps translated content to a separate repo than the archived or the active-English is already in place.
The next action is:
This is too old now. We have built most of the things we need up until now. We already have the stuff in place and working for freezing translated content.
https://github.com/mdn/yari/pull/1673 is an interesting idea of uplifting translated content.
At this point, this issue isn't really helping us make progress so I'm going to close it.
(this issue isn't firm as an actual action item)
Here's a plan for archiving all localized content:
!= en-US
will also be archived (*).(*) Reminder; archived content goes into a different git repository. It's the rendered out HTML only (the document meat, not the page). But the original source Kuma-HTML is saved. Its URLs are not included in search or Sitemaps.