responsible-ai-collaborative / aiid

The AI Incident Database seeks to identify, define, and catalog artificial intelligence incidents.
https://incidentdatabase.ai
Other
169 stars 35 forks source link

Create dashboard/report for tracking i18n translations #2890

Open kepae opened 3 months ago

kepae commented 3 months ago

Some resources we use for internationalization of strings are separated from one another without an easy way to track coverage of strings in each language – for example, in individual JSON files: https://github.com/responsible-ai-collaborative/aiid/blob/98da6125d26a25ef06cea5dbce398142b4471a10/site/gatsby-site/i18n/locales/fr/footer.json.

In addition, often a given string/interface is available in one language but not another.

We should develop a way to gather all i18n resources and intuitively display the coverage of strings in a report or dashboard-like interface.

We can produce this report as part of an action workflow, or perhaps when building the site it can be produced as a static page.

Improving internationalization is also a common entry point for would-be contributors for getting involved! We should make this easier for them as a community.

lmcnulty commented 2 months ago

I think there are a few different situations we would like to be able to find:

  1. site/gatsby-site/i18n/locales/fr/translation.json has a translation for the string "Hello world", but site/gatsby-site/i18n/locales/es/translation.json does not.

    • This is pretty simple and just requires running a script comparing the JSON files.
  2. A certain code file contains <Trans>Hello world</Trans> but there is no key for "Hello world" in any translation file.

    • This requires some kind of tooling that can parse i18next strings, of which there are several. I've tried a few of them and they all seem to miss some of them due to (maybe) whitespace issues, which is annoying.
  3. A certain code file contains the user-visible <h1>Hello world</h1>, but it's not wrapped in a <Trans> or a t().

    • I'm pretty sure an LLM could find these cases reasonably effectively if we periodically fed it our code.

I'm not sure what the purpose is of having multiple translation files per locale? It seems like most things go in translation.json but some thing, like the submission form, get their own files. I guess it's to make it easier to find a string from a particular place in the code? But it seems to me like it would be easier to have everything in one file and just ctrl+F when you're looking for a particular string.

kepae commented 2 months ago

Your breakdown is great. :-)

I would emphasize tackling the issues in that order -- 1. existing i18n files, 2. moot <Trans> tags and others, and 3. text not yet prepared for i18n.

But it seems to me like it would be easier to have everything in one file and just ctrl+F when you're looking for a particular string.

I agree. I'm used to having centralized "strings" files. I'm not sure if there is a method to the compartmentalization, but this is something to consider cleaning up when considering improving our i18n.