Open brian-smith-tcril opened 1 year ago
@brian-smith-tcril I looked up for tools to validate JSON translation files, but found none online.
Does it make sense to convert json to po-files and validate them? I'm not sure if there's a two-way conversion between the two format.
I know that Apache superset has a po2json utility, but I haven't tested it.
@OmarIthawi it looks like translate-toolkit could work, it supports converting to and from json.
I was hoping there would be a way to validate without converting out of json by just using the formatjs cli directly, but that doesn't seem to be possible.
I'm not sure if there are any strange edge cases that will trip up the validation with the conversion back and forth, but I think trying out validation with translate-toolkit's json2po and po2json is a reasonable path forward.
It seems that there has been a regression in validation. For example, see this translation where the slot/param name was translated:
We have tooling in https://github.com/openedx/i18n-tools that's supposed to catch this sort of thing. It currently picks up about 300 errors in edx-platform alone. Is there some reason this is no longer in use?
@timmc-edx We have tests in place that is supposed to match i18n-tools
, however it sounds like it needs more work.
I couldn't make i18n-tools work with non-standard file structure such as translations/edx-platform/...
therefore I used msgfmt
directly.
What do you suggest?
I don't know exactly what msgfmt checks, but I think it's just basic file structure. i18n_tool checks a variety of other things, including whether there are the same parameter slots in the inputs and outputs. You can see all the things it checks here: https://github.com/openedx/i18n-tools/blob/master/i18n/validate.py
I think it would be appropriate to modify that tool to allow different directory structures (or even just expect a different one).
Yes, it seems like msgfmt
is only a part of i18n-tools/blob/master/i18n/validate.py.
I'll keep this issue open for now. Unless someone works on it, I plan to fit within my Open Source contributions late August / early September.
For reference, here are the strings that i18n_tool
currently picks out as broken in edx-platform: https://gist.github.com/timmc-edx/eeaa4c1a40ee07871049cba53b30b4fa
Thanks @timmc-edx. That's mostly machine translation issues. I recall the Arabic strings to be of high quality and haven't had such issues in the past.
The solution I have in mind is to:
Fixing them by hand all at once might be prohibitively costly.
What do you think?
I think we could do it by hand -- I bet there's a way to bulk find-and-unapprove translations in Transifex, and we could use search strings like <
and notranslate
to find them. I'd be fine doing that work in between other tasks, and I'd really like to avoid having the site untranslated in those languages for the duration.
But yeah, I think the first step would be adding i18n_tool to the CI step so that translation updates are blocked until they get cleaned up.
@timmc-edx there's a way to bulk review/unreview/delete transaltions. But it'll be one language at a time.
I have capacity issues at the moment regarding my contributions time i.e. I've done much more than planned this month and I need to focus on other projects at hand.
But yeah, I think the first step would be adding i18n_tool to the CI step so that translation updates are blocked until they get cleaned up.
Yes.
If help is there, I'd really appreciate it. Otherwise I'd have to clear my plate a bit.
@OmarIthawi mentioned an example where invalid strings could break a build here.
When the strings were checked into the repos directly, this would fail the individual repo CI. With the switch to
openedx-translations
/openedx-atlas
, strings updated in transifex and synced toopenedx-translations
will not trigger a CI build on any given repo.We'll want to ensure we don't have strings that break builds checked into
openedx-translations
, so we should add validation somewhere.In the same comment, @OmarIthawi mentioned
That seems like a good plan to me