auto scan for translation strings

JohnRDOrazio commented 11 months ago

Rather than manually adding translation strings to the public/locales/en/translation.json file, and trying to keep other language files in sync with the English language file, a simple scanner could be implemented and triggered either when committing/pushing or when a PR is issued.

For example, https://github.com/i18next/i18next-scanner .

I've run a few tests locally, but I'm running into a couple issues which I haven't succeeded in overcoming yet:

The tags within <Trans> components (such as <p>, <b>, <i>, <strong>, etc.) are being substituted with indexed tags, even though I set supportBasicHtmlNodes to true and keepBasicHtmlNodesFor to ['br', 'strong', 'i', 'p', 'b', 'vatican', 'github']. I have tried a few things such as:
- setting similar options in i18n initialization ( transSupportBasicHtmlNodes and transKeepBasicHtmlNodesFor ) to no avail
- setting the components parameter within the Trans component to no avail
i18next-scanner is supposed to be able to play nicely with Trans components, but I haven't been having much luck as of yet
While the configuration option defaultValue works fine for t() functions, however for Trans components the string value in English is being set in all language files, whether English or not. Instead it would be desirable for an empty value to be set in non English language files, while the source string is correctly set in the English language file. This perhaps needs to be handled within the customTransform function?

I have created a new branch to track progress on this: feature/i18n-scanner

JohnRDOrazio commented 11 months ago

I won't turn the feature/i18n-scanner branch into a PR until it's further tested and until PR #57 is completed, seeing that the feature/i18n-scanner branch was built off of that one. It successfully scans the new items created in PR #57 and adds any missing items to the other translation languages.

JohnRDOrazio commented 11 months ago

An example of how I'm using i18next-scanner in the Liturgical Calendar project:

1) https://github.com/Liturgical-Calendar/LiturgicalCalendarFrontend/blob/main/gulpfile.js 2) https://github.com/Liturgical-Calendar/LiturgicalCalendarFrontend/blob/main/.github/workflows/i18n-scanner.yml

I am not however using any Trans components in the Liturgical Calendar project, which makes this workflow simpler. Having to deal with Trans components will take some looking into.

JohnRDOrazio commented 11 months ago

To make for a cleaner commit history, I deleted the feature/i18n-scanner branch and relative PR, and created a new cleaner branch feature/i18next-scanner with relative PR #60 . It still needs some testing as regards Trans components...

JohnRDOrazio commented 11 months ago

I'm seeing that the Parser class does have a method parseTransFromString. Implementing this method, I am at least able to get some feedback on the Trans components that are found in the various source files, though there do seem to be a few issues:

one user reports that the customTransform function is only called after a first pass is already done, when the extensions property is set to a non null value in the general options ( https://github.com/i18next/i18next-scanner/issues/173 )
another user reports that the Trans components seem to be parsed multiple times, rather than once for each component ( https://github.com/i18next/i18next-scanner/issues/226 )

These issues may explain some of the oddities of how the Trans component is being parsed...

JohnRDOrazio commented 11 months ago

Well I seem to have made some progress: I created a separate Parser instance for transforming Trans components (see commit ae8d280), and at least the HTML nodes are now being preserved, instead of getting indexed nodes. I'm not sure what exactly did the trick, but somehow using a separate Parser instance seems to be finally working.

JohnRDOrazio commented 11 months ago

However I'm still not able to make the defaultValue = "" for languages other than English...

kas-catholic commented 11 months ago

Perhaps this is just because I'm still not that familiar with Weblate, but can you help me understand what problem we're trying to solve here?

What happens (in Weblate) today if I merge a PR that adds an en translation but is missing other languages? I'd expect weblate to show this as a missing translation but maybe it doesn't?
What does i18next-scanner do that solves the problem for us? It just inserts empty values into the other language files?

kas-catholic commented 11 months ago

Ah, I just noticed we broke stuff here https://github.com/kas-catholic/confessit-web/pull/57

I should have noticed in review that we were adding strings without adding them to en.json. Would weblate work (detect the new strings) if we were to add strings to en.json in a PR like that?

JohnRDOrazio commented 11 months ago

That's exactly why I've been looking into i18n-scanner, so you don't have to worry about manually adding keys to en.json

JohnRDOrazio commented 11 months ago

Rather than add them manually, I would continue looking into PR #60 , and that will take care of missing strings

JohnRDOrazio commented 11 months ago

Weblate will take care of adding missing keys to translation files, if they are present in the source file (en/translation json).

What this solves, is keys that are missing from en/translation.json. Rather than add them manually (and perhaps forget to add one), this will ensure that all keys in the source code are defined in the source translation file, and can optionally remove unused keys.

Say for example you change a component, or decide to handle a certain string differently and wind up changing it's key, or splitting it into multiple translation strings each with their own key, or simply add a new component with it's own translation strings and new keys... Whatever is the case, you will always be sure to have the exact needed keys in the source translation file

JohnRDOrazio commented 11 months ago

fixed in PR #60

kas-catholic / confessit-web

auto scan for translation strings #58