pkp / pkp-lib

The library used by PKP's applications OJS, OMP and OPS, open source software for scholarly publishing.
https://pkp.sfu.ca
GNU General Public License v3.0
304 stars 444 forks source link

Replace bespoke translation toolset with more standards-based options #4779

Closed hsluoyz closed 4 years ago

hsluoyz commented 5 years ago

Currently, all translations are done in XML files, like mentioned in: https://github.com/pkp/pkp-lib/issues/4029#issuecomment-417907420, which is very inefficient for translators to translate, or sync a few of items between dozens of XML files.

Is there any chance to use a more advanced online translation platform like: https://crowdin.com/ ? In crowdin, all translators only need to do the translation in web browser, and no need to track which words have not been translated yet. The translation will be deployed automatically with a new git commit. Can we consider it?

NateWr commented 5 years ago

My understanding is that the workflow software we're adopting will commit changes back to the project. Even if translation tweaks are modified outside of git, one of us still has to commit it, right?

asmecher commented 5 years ago

What I mean is that direct modifications to the locale files are one of the most frequent tweaks made by end users. (We do have the custom locale plugin to help avoid this, but it's not universally used, nor very well polished.)

NateWr commented 5 years ago

Ahh..... hmm. What about a cached file that could be cleared through the admin like template/css cache?

asmecher commented 5 years ago

A PHP data cache would be fine, much as we already have for locale XML; the existing data reset tool would work for that as well. It would be a chance in expectations, though. If we synchronize it with the move to XLIFF, it might be something we could tuck into a new workflow without needing a second round of disruptions...

marcbria commented 5 years ago

Files don't necessarily have to be combined into one in order to load them automatically -- either at once or on-demand. For example, an index could link keys to files and they could be loaded when an unloaded key is requested.

Thanks Nate. I didn't thought in this. :+1:

My understanding is that the workflow software we're adopting will commit changes back to the project. Even if translation tweaks are modified outside of git, one of us still has to commit it, right?

To clarify this: Yes, "commit back/forward to/from project" is one of the main goals the translation server needs to accomplish automatically, but... what else can be "done outside git"? I mean, changes will be in the translation server (by translators) or in git (by developers), isn't it?

A PHP data cache would be fine, much as we already have for locale XML; the existing data reset tool would work for that as well. It would be a chance in expectations, though. If we synchronize it with the move to XLIFF, it might be something we could tuck into a new workflow without needing a second round of disruptions...

Sorry... I'm completely lost here. @asmecher do you think you can simplify this for dummies? If is not possible... no worry. I can perfectly live without knowing about this. :-)

The remaining problems will probably be resolved by adopting crowdin et al.

Thanks @jonasraoni for your "two cents". They make sense to me but let's see what dev guys say. :-)

Only a comment. About crowdin, I pointed license issues here so this is why I encourage to use weblate instead.

Finally... @asmecher in parallel to the deep-Dev discussion ¿do you think it's safe to go with "stage 1"?

If is ok for you, with @MarcRiera we planned to start working on this during September so I hope we can clarify those questions) and do the weblate configuration/testing during this month.

If we can set the server, plus the code you wrote to make OJS understand XLIFF... seams feasible to announce the translation server at the PKPBCN19, isn't it?

NateWr commented 5 years ago

¿do you think it's safe to go with "stage 1"?

I think it's safe to go ahead with the stage 1 proposal as Alec has described it (https://github.com/pkp/pkp-lib/issues/4779#issuecomment-524495563). All of our discussions are about how to improve things beyond stage 1.

do you think you can simplify this for dummies?

We're talking about how to automatically build a file that will tell us where to look for translations. So, for example, when the code hits __('my.locale.string'), the application will know which locale file to parse and load. That way we don't have to load every translation file every time (which is not performant), but we also don't have to manually load the correct one (which is prone to mistakes).

The solution we're discussing regarding a PHP data cache is similar to how CSS and Smarty (.tpl) files are built and cached, because rebuilding them for every page load is not performant.

asmecher commented 5 years ago

If is ok for you, with @MarcRiera we planned to start working on this during September so I hope we can clarify those questions) and do the weblate configuration/testing during this month.

@marcbria, the plan outlined in the 2 stages removes the need for us to check whether Weblate works with symbolic keys -- I think we're OK on that front. But if you could run a quick Weblate test with e.g. the French XLIFF samples I've generated, that would be excellent -- the only thing we need to ensure is that Weblate will preserve the <unit id="..."> attribute while editing.

marcbria commented 5 years ago

We're talking about how to automatically build a file that will tell us where to look for translations. So, for example, when the code hits __('my.locale.string'), the application will know which locale file to parse and load. That way we don't have to load every translation file every time (which is not performant), but we also don't have to manually load the correct one (which is prone to mistakes).

Muuuch more clear now. Thanks @NateWr. ;-)

the plan outlined in the 2 stages removes the need for us to check whether Weblate works with symbolic keys -- I think we're OK on that front.

But are we completely sure about moving to PO (stage2)? Even nextcloud (IMHO one of the best php developments ever) is avoiding PO and is going to JSON.

And yes... XLIFF was originally though for "transportation", but in our case will mean a "minimal" change from our native xml format, other projects started this way and we can talk with the OASIS XLIFF fellow to ask them to encourage people walking this way.

If you three (@asmecher , @NateWr and @ctgraham) are sure about this I won't ask again, but I can't stop thinking we are moving in the wrong direction.

But if you could run a quick Weblate test with e.g. the French XLIFF samples I've generated, that would be excellent -- the only thing we need to ensure is that Weblate will preserve the attribute while editing.

Thanks you both. We can do this test but I think @MarcRiera did the job before and said the key is preserved.

About weblate, I was concerned about all other features we pointed as a requirement (push/pull, workflows, permissions/roles, glossaries, translation memories... that we wrote somewhere but I can't find it now. @mtub do you have a copy somewhere?)

If we can set the server, plus the code you wrote to make OJS understand XLIFF... seams feasible to announce the translation server at the PKPBCN19, isn't it?

You probably intentionally missed to answer this question? ;-)

marcbria commented 5 years ago

I found the notes Marco took in Heidelberg: slides.html.txt

And here we made a comparative between weblate and transifex: (that include requirements) https://docs.google.com/spreadsheets/d/1rSp350oJEEb6PYOfjpMzQnlTNGOH_UiJDpZU2GbXWbE/edit#gid=2143124171

asmecher commented 5 years ago

If we can set the server, plus the code you wrote to make OJS understand XLIFF... seams feasible to announce the translation server at the PKPBCN19, isn't it?

You probably intentionally missed to answer this question? ;-)

Yup, I'm planning to include some XLIFF-compatible tweaks that interested parties can experiment with for the 3.2 release.

asmecher commented 5 years ago

Gettext library feature add for supporting XLIFF unit IDs is now merged: https://github.com/oscarotero/Gettext/pull/221#event-2634582256

asmecher commented 5 years ago

Another blocker, unfortunately :) https://github.com/oscarotero/Gettext/issues/224

asmecher commented 5 years ago

Latest update:

I'm tinkering with both using Weblate because Weblate manages both monolingual (symbolic locale keys) and bilingual (main language in source code, mapping from there to secondary languages) modes for both file formats. (See https://docs.weblate.org/en/latest/formats.html for the list.)

Command line to batch-convert XLIFF:

for locale in `ls locale`; do for file in `fgrep -l locale.dtd locale/$locale/*.xml | cut -d "." -f 1`; do php lib/pkp/tools/xmlToXliff.php `echo $file.xml | sed -e "s/$locale/en_US/"` $file.xml $file.xlf; done; done
for locale in `ls lib/pkp/locale`; do for file in `fgrep -l locale.dtd lib/pkp/locale/$locale/*.xml | cut -d "." -f 1`; do php lib/pkp/tools/xmlToXliff.php `echo $file.xml | sed -e "s/$locale/en_US/"` $file.xml $file.xlf; done; done

I'm still favouring XLIFF because our XLIFF files are bog-standard, rather than stepping outside the spec, as monolingual PO files do (even if they're used in some projects in practice).

asmecher commented 5 years ago

Yikes, it looks like Weblate may not support XLIFF 2.0!

asmecher commented 5 years ago

Commands to batch-convert XML to PO:

for locale in `ls locale`; do for file in `fgrep -l locale.dtd locale/$locale/*.xml | cut -d "." -f 1`; do php lib/pkp/tools/xmlToPo.php $file.xml $file.po; done; done
for locale in `ls lib/pkp/locale`; do for file in `fgrep -l locale.dtd lib/pkp/locale/$locale/*.xml | cut -d "." -f 1`; do php lib/pkp/tools/xmlToPo.php $file.xml $file.po; done; done
asmecher commented 5 years ago

A teaser :)

image

marcbria commented 5 years ago

@asmecher this is impressive!! It's almost finished! It's too crazy thinking that we will be able to make OJS3.2 es_ES and ca_ES transaltion over weblate, isn't it? :-)

I didn't find time to contact @MarcRiera and make the tests we promised. I will be a little more relaxed at the end of the month...

Cheers, m.

asmecher commented 5 years ago

@marcbria wrote:

I didn't find time to contact @MarcRiera and make the tests we promised.

Because https://github.com/oscarotero/Gettext works with XLIFF 2.0, but Weblate seems to only work with XLIFF 1.2, I've chosen (at least for now) to focus on "monolingual PO" as our chosen format. So if you were planning to experiment with the sample XLIFF, I'd suggest holding off on that for now. Here are some sample PO files -- Weblate appears to work well with them in monolingual mode.

marcbria commented 5 years ago

I missed this one: :-(

Yikes, it looks like Weblate may not support XLIFF 2.0!

So this other one made me think you were now focused and succeed on XLIFF:

I'm still favouring XLIFF because our XLIFF files are bog-standard, rather than stepping outside the spec, as monolingual PO files do (even if they're used in some projects in practice).

I just ask in weblate github if they are planing to support XLIFF 2.0 anytime soon.

Otherwise, I'm unsure about the options we have here: a) move directly to PO. b) look for a different free software translation server. c) find how to downgrade to 1.2. d) ...

Looking into the differences between both XLIFF specifications the downgrade (c) will be complex. We look deep but [1] we didn't found any good alternative free soft (b) to do the job... so, does it mans we need to go with (a)?

Please Alec, let us know if we can help with something.

[1] mojito looks promising and supports xliff 2.0, but it's still very simple compared to weblate.

asmecher commented 5 years ago

I just ask in weblate github if they are planing to support XLIFF 2.0 anytime soon.

Here's an already-open issue for XLIFF 2.0 support in Weblate: https://github.com/WeblateOrg/weblate/issues/972

I'm OK to go with a) move directly to PO, as long as everyone understands that we're going to be using monolingual PO files rather than bilingual PO files. This is not how PO files were initially intended, but there are projects that use them this way, and Weblate includes support for it.

asmecher commented 5 years ago

Please Alec, let us know if we can help with something.

Yes, if it's possible to start putting together a production-capable Weblate install for us to use, that would be very helpful :)

marcbria commented 5 years ago

We have multiple options here:

  1. We offered our journals production server (hudge cpu, plenty of space and memory) to host the weblate docker (with daily backups).
  2. Weblate itself offer a SaaS option somewhere.
  3. In their documentation, they talk about Bitnami and Yunohost.
  4. We can talk with other PKP partners with more resources to host the server.

Witch do you like best? If we go with a docker approach, and we decide to move from one place to other, the migration it's supposed to be trivial.

Cheers, m.

marcbria commented 5 years ago

Ok... I couldn't resist the temptation. Server with last weblate version is up and running at: http://revistes.uab.es:8081

Sending by mail the login credentials to you as well as some indications about the docker configuration. We still need to setup the git push/pull feature (in confidence, I have no idea about how it is supposed to work), but we can worry about this after isn't it?

BTW, if everything is as advanced as you show, I offer myself and my team as guinea pigs to make the es_ES and the ca_ES OJS 3.2 translations over the brand new server.

See you soon in Barcelona, m.

marcbria commented 5 years ago

BTW, looks like XLIFF 2.0 is not implemented a widely and there is not backwards compatibility to XLIFF 1.2 so PO solution is the more standard.

I like a lot XLIFF (even I'm still surprise it is only used as a "transport" format and only a few are using it natively) but the fact is that only a free CAT tools suppport XLIFF 2.0 so IMHO won't be a good idea work with xliff 2.0 if our translators are not able to work with their favourite tools external.

I missed one question you made in a former post:

I'm OK to go with a) move directly to PO, as long as everyone understands that we're going to be using monolingual PO files rather than bilingual PO files. This is not how PO files were initially intended, but there are projects that use them this way, and Weblate includes support for it.

I think we are fine with this (as you said, some projects work in this way), but let me ask @MarcRiera if it's a safe road.

asmecher commented 5 years ago

@NateWr and I discussed how to stage this out and roughly decided:

  1. Review/merge changes into master with select locales converted to PO (en_US, fr_FR, es_ES, de_DE).
  2. Test/document translation process using these translations.
  3. Fork a pre-conversion branch for tardy XML translations to be submitted to (but do not advertise :); these can be converted with some headache if needed
  4. Batch convert all remaining translations
  5. Translation round for 3.2 using weblate!
  6. Around the 3.3 release mark, remove backwards-compatibility tools (https://github.com/pkp/pkp-lib/issues/5090)

@NateWr, for step 1, could you look at... https://github.com/pkp/ojs/pull/2479 https://github.com/pkp/pkp-lib/pull/5107

asmecher commented 5 years ago

(Obviously I'll generate PRs for OMP and probably PPS once we're ready for a merge.)

NateWr commented 5 years ago

This looks great, with a remarkably small impact on the codebase outside of the locale files. :+1:

One question I had was how editing will work during development. Will I modify the en_US po files myself, similar to how its done now with the XML files? Or do these need to be generated from something?

Also, is there any tooling (po, gettext, weblate, etc) that will automatically identify changed/removed en_US strings, so I don't have to delete these from other locales when committing changes?

marcbria commented 5 years ago

Translation round for 3.2 using weblate! @asmecher if we manage to do it before PKPBCN19 I will pay all the beers you can drink during after the sprint. (not before because during the sprint we still need your brain) ;-)

hsluoyz commented 5 years ago

Any update?

marcbria commented 5 years ago

Server installed. Working on configuration. Code upadated. Testing soon. Goal? Translate OJS 3.2 with weblate. Follow this thread for detailed info: https://github.com/pkp/ojs/pull/2479

asmecher commented 5 years ago

Notes to self: On using import_json to create translation components...

  1. Generate the JSON. Use:
    
    <?php

/**

fgnievinski commented 4 years ago

Also, is there any tooling (po, gettext, weblate, etc) that will automatically identify changed/removed en_US strings, so I don't have to delete these from other locales when committing changes?

The support for monolingual Gettext in Weblate includes the "needs editing" status (in addition to "untranslated" and "translated"). See right-most column in the table of "Translation types capabilities" https://docs.weblate.org/en/latest/formats.html#translation-types-capabilities

This is super important to keep translations consistent with changes in the base English version: https://docs.weblate.org/en/latest/workflows.html#translation-states

The only requirement is setting en_US as monolingual base language file in Weblate:

For correct use of monolingual files, Weblate requires access to a file containing complete list of strings to translate with their source - this file is called Monolingual base language file within Weblate, though the naming might vary in your application.

https://docs.weblate.org/en/latest/formats.html#bilingual-and-monolingual-formats

veotax commented 4 years ago

Any update? Is there any chance that we can start to translate the .PO files first? I think it is compatible with the later Weblate?

asmecher commented 4 years ago

@veotax, what translation are you interested in working on?

veotax commented 4 years ago

Chinese (zh_CN). Is it ready to be translated now? And what project (ojs or pkp-lib) and what branch (master or stable-3_1_2) should I work on and send PR?

Can I copy the .po files from another folder like en_US to zh_CN and then translate the words? Or is there another process?

asmecher commented 4 years ago

@veotax, excellent! The 3.1.2-x releases will continue to be in our old .xml format, but version 3.2 and onward will use monolingual .po files (supported by Weblate). I would recommend targeting 3.2 (due for release early next year) and using .po. I have converted only selected languages in the master branch to .po, but if you're ready to begin working with Chinese in that format, I can convert it as well. Just let me know! There is already an existing zh_CN translation, it just needs to be updated.

veotax commented 4 years ago

So I should use master branch.

What do you mean by I can convert it as well.? You mean you have a tool/script to generate .po files from existing .xml files? If yes, please do it. The original zh_CN .xml files already miss some words (not translated words in UI). So I think the generated .po files also miss them, right? So I need to translate them.

asmecher commented 4 years ago

@veotax, I've just converted the .xml files over to .po for the zh_CN locale. (There's a tool for this in lib/pkp/tools/xmlToPo.php in the master branch, which will be released as OJS 3.2.)

For the .po files to work, you'll need a full checkout of the master branch, rather than an existing OJS 3.1.2.x release.

The commits with the file conversions are here: https://github.com/pkp/ojs/commit/831f4a386ef56ec68e407bd0eef42f108af64c5f https://github.com/pkp/pkp-lib/commit/57ccd97f7c42c9e31c8061be30a3921352a8f565

fgnievinski commented 4 years ago

@asmecher: should emailTemplates.xml be converted to PO, too?

asmecher commented 4 years ago

@fgnievinski, no, that's a different XML dialect; I'm still considering what best to do with that. It probably makes the most sense to convert it to .po as well, but we would need a mechanism to link email keys (e.g. NOTIFICATION) with email body, subject, and description for each language.

marcbria commented 4 years ago

@asmecher check your mail and confirm weblate server is working, please. ;-)

asmecher commented 4 years ago

@marcbria, check Slack :) Too many venues!

fgnievinski commented 4 years ago

@asmecher still about emailTemplates.xml, how about replacing everything between <email_text key="NOTIFICATION"> and </email_text> for {translate key="email_text_key_NOTIFICATION"}? The HTML tags can be left inside the localization text, we translators are used to deal with those.

asmecher commented 4 years ago

@fgnievinski, we may well end up doing something like that. Thanks for the suggestion!

veotax commented 4 years ago

@asmecher is there any way to list all un-translated words together in the PO file? I found the converted zh_CN PO files (https://github.com/pkp/pkp-lib/commit/57ccd97f7c42c9e31c8061be30a3921352a8f565) don't contain all the words. Currently, I have to copy each untranslated keyword from web UI (sometimes uncopiable) to PO and translate it. It's too slow.

image

asmecher commented 4 years ago

@veotax, Weblate will help with that (and I suspect other translation tools capable of working with monolingual PO files as well). They'll do that by fetching the full list of locale keys from the English locale files, then comparing them with your translation to determine what's missing.

veotax commented 4 years ago

Thanks. So before our official Weblate is online, can you recommend some tool (local or web-based) that I can use to start to translate painlessly?

asmecher commented 4 years ago

@veotax, our XML-based translation toolset used to do this, and I'm sure Weblate does, but I haven't tried other tools.

marcbria commented 4 years ago

If you are working over the OJS native xml format, the only tool is the OJS translation plugin.

If you are working in the new PO files, you have plenty of them. I suggest you two:

Here you have an article with a list of the most usual ones:

It's still soon and some research need to be done, but I'm planning to encourage my translators to work offline if they are working in big translations. Weblate will be also great, but when you are doing a looong work, the web lag will kill your patience. Desktop tools include more features, are faster and when you finish, (hopefully) you will be able to upload the results to weblate.

asmecher commented 4 years ago

Converting email templates to the PO format!

PRs:

This preserves the old XML format (locale/en_US/emailTemplates.xml), but replaces the (localized) contents with {translate ...} calls, e.g.:

        <email_text key="NOTIFICATION">
                <subject>{translate key="emails.notification.subject"}</subject>
                <body>{translate key="emails.notification.body"}</body>
                <description>{translate key="emails.notification.description"}</description>
        </email_text>

Then the translations themselves come from a new PO file, e.g. locale/en_US/emails.po for English.

There's a new conversion tool to help with this in lib/pkp/tools/xmlEmailsToPo.php. It generates the new PO file and changes the old files over to {translate ...} calls.

@marcbria, could you take a quick look? Does this seem like a workable approach? If so, I can merge and set up a new "Emails" component in Weblate.