serge-community / serge

Continuous localization platform
https://serge.io/
Other
237 stars 53 forks source link

Option ot use .xliff as translation interchange files #11

Closed prat0088 closed 5 years ago

prat0088 commented 8 years ago

I'm new to Serge. This week I have been reading through the docs, code, and presentations. From what I gather Translation Interchange File must be .po files. I was wondering if you're open to adding an option for xliff files, and how you think it would be best implemented in Serge.

iafan commented 8 years ago

My vision is that transport file serializer/parser should be a plugin, and support for .po transport files would become a plugin as well (used by default in not specified explicitly, for backward compatibility). This requires externalizing .po handling code and preparing an infrastructure for such plugins. After that, writing a support for XLIFF or other formats should be straightforward (provided such a format has a notion of "developer comment", "translator comment", "context", "fuzzy/needs work" flag besides just having key-value pairs, otherwise much of the Serge power in providing this information would be lost).

Having said that, .po format worked perfectly so far for our needs so far, and this is why I didn't externalized the related code yet. If you need that, I can work on preparing the infrastructure.

Would be also good to know why you need XLIFF. Is it strictly required for integration with some translation service, or just a matter of preference?

prat0088 commented 8 years ago

I see. That seems like a reasonable design.

The reason I'm interested in this possibility is because the commercial CAT tools I'm trialing have better support for xliff.

iafan commented 8 years ago

If your intent is to use Serge with some external CAT tool in order to implement the continuous localization approach, make sure such tool works not in terms of "uploading translation jobs", but can simply reflect the current state of your translation files (in other words, there should be a way to synchronize with the files by adding missing strings and removing outdated ones from CAT internal database). Or this could be an offline CAT tool that you can simply use to open translation files directly.

If you want, ping me on our IRC channel so we could discuss this in a bit more detail.

I also strongly suggest looking at Pootle as your translation frontend (this is what we use at Evernote), as it works with Serge beautifully.

nemoeslovo commented 8 years ago

Possible workaround for now could be a TranslationService plugin, which will use xliff2po util from Pootle developers, which will convert generated files before actual performing pull-ts and push-ts

prat0088 commented 8 years ago

I just came across another use case for this request:

For very large translation requests, it's sometimes convenient for our translators to use Exel because of macros and navigation speed. It's what most are familiar with. If we could load Pootle with .csv files instead of .po files then the translators would have the option of doing offline translation on the .csv.

I'm still interested in modularizing Engine.pm so that users can choose their ts output format.

iafan commented 8 years ago

I'll look at externalizing this soon. This needs to be done for the sake of code quality/maintainability regardless of the external use cases (which I might not fully understand or agree with).

Speaking of CSV/Excel, I understand that it might sound convenient (would ensure a smoother path to your translators), but the downsides of this are (just for you to be aware of):

  1. Lack of TM (similar translations) provided within translation UI
  2. Lack of terminology suggestions
  3. Lack of quality checks and immediate feedback
  4. Lack of the "Needs review" notion
  5. Translators might be working with the strings which are already outdated (and which were removed from Pootle).

If your concern is primarily speed, I'd suggest looking into both options:

  1. Improving Pootle speed when it comes to navigating between units. There's a room for optimization there, but if you have some poor performance, this is something that Pootle devs need to be aware of.
  2. As a transitional step, you might want to look at offline editing of .po files (because there are tools that allow you to translate .po offline), and their UI is still orders of magnitude better than using Excel for translation). Take a look at Virtaal (done by the same guys behind Pootle), or POEdit.
prat0088 commented 8 years ago

Thanks for pointing out the downsides. I am aware of them, but there are a few small cases where it can make sense for the type and quantity of content we translate here. I'm not advocating it as the go-to tool. Just something to keep in their back pockets in the rare case it is needed.

If your concern is primarily speed...

General interface responsiveness is one part of "speed". I agree Pootle should be enhanced as you suggest.

Pre-translation and suggestions are the other part of "speed" in certain instances for certain categories of requests. We found Excel and one commercial CAT tool work great for us here. This is entirely in-house and business related so I can't go into more detail. I'm reasonably certain this makes sense to us so I'll just leave it as our special case.

erikogan commented 8 years ago

I would love to be able to use XLIFF 2.0 as the translation interchange format over PO. I would also be happy to help with the efforts to externalize the PO support.

(I am relatively new to Serge, and my Perl skills have atrophied over the last 8-10 years, but I think this could be a great way to familiarize myself with the codebase, and those muscles were well developed once, they will return eventually.)

iafan commented 8 years ago

@erikogan thanks for volunteering! I think we can split the effort where I'll deal with externalizing the current code and creating a .po serializer plugin, and you could work off that to provide XLIFF serializer.

iafan commented 8 years ago

Ok, the first part of the work is done.

@prat0088 I also added CSV serialization plugin. Let me know if this works for you. In addition to providing translation, you can also marks strings as needing work there, and translators can provide comments in a separate column. See the new serialize_csv test for more information and sample .csv file. Docs on serge.io will be added later.

prat0088 commented 8 years ago

@iafan Thanks! I think I'll have time in the next week to try it out.

iafan commented 8 years ago

Documentation has been added.

whereisjim commented 7 years ago

Any update with XLIFF 2.0 serializer?

iafan commented 7 years ago

@whereisjim I didn't hear of such activity, but it shouldn't be that hard to add now that we have XLIFF 1/2 parser, which can be used to borrow the code from.

Is the absence of XLIFF serializer preventing you from using Serge in a specific scenario? What do you do with the serialized files?

whereisjim commented 7 years ago

We are trying to use the markup tag in XLIFF to block some keyword so we can prevent over translation for certain strings such as product names, name of functions, code and etc.

iafan commented 7 years ago

Ok, so you need XLIFF serializer with the ability to specify — by the means of e.g. regular expressions — some sequences that should be marked as untranslatable.

Do you send these XLIFF files to some external localization vendor? Or do you have your own localization software? The reason I ask is that we use Serge with Zing translation server (our own fork of Pootle), and we found that, instead of locking parts of the string as non-translatable, it's easier (both implementation-wise and from translator's experience perspective) to allow edit the entire string, but have this string immediately validated afterwards. We implemented many quality checks, so that if someone breaks placeholders or tags, they will immediately see this in the translation UI, and localization managers will see this as well, so it is pretty trivial to go through failing units and fix them.

whereisjim commented 7 years ago

Right now, our process is developed using Catalyst and thinking about moving to web. Thanks for the link for Zing. I just checked the page and quick question with 'Requirements: TTK bump'. That is TTK means in here?

whereisjim commented 7 years ago

Actually, we have a similar plan for locking or not. We are also thinking about using Terminology to prevent over translation instead of actually lock them in the sentences.

iafan commented 7 years ago

That is TTK means in here?

TTK stands for Translate Toolkit (an underlying library used in Pootle and [still] used in Zing)

dragosv commented 6 years ago

Beta versions of xliff serializers are here

Xliff 1.2 https://github.com/dragosv/serge/tree/xliff https://github.com/dragosv/serge/blob/xliff/lib/Serge/Engine/Plugin/serialize_xliff.pm

Xliff 2.0 https://github.com/dragosv/serge/tree/xliff2 https://github.com/dragosv/serge/blob/xliff2/lib/Serge/Engine/Plugin/serialize_xliff2.pm

Xliff 1.2 has been tested against various translation services while 2.0 was only tested manually (Using Ocelot)

Any feedback would be appreciated. Will add documentation soon. For now, there are tests for most of the options the serializers are supporting.

dragosv commented 6 years ago

In case anyone is wondering two versions are needed for xliff serializers as Xliff 1.x and 2.x diverged heavily and there is no backwards compatibility.

iafan commented 6 years ago

@dragosv why do you think it's important to support both versions? Do other services only support 1.x at the moment?

dragosv commented 6 years ago

Most only support 1.2 or a subset of it. Just of few support 2.0 so initially if 1.2 is supported is ok, and then after testing the 2.0 serializer against providers that support it, the xliff 2.0 serializer should be added.

dragosv commented 6 years ago

On top of it 1.2 was tested against 6 providers and 5 support it to a certain level, while I have not tested 2.0 against any provider.

dragosv commented 6 years ago

@erikogan xliff 2.0 serialized is here in a beta version. Please take a look and let me know what you think.

https://github.com/dragosv/serge/tree/xliff2 https://github.com/dragosv/serge/blob/xliff2/lib/Serge/Engine/Plugin/serialize_xliff2.pm

dragosv commented 6 years ago

Pull request created https://github.com/evernote/serge/pull/78 for the xliff 1.2 serializer

dragosv commented 5 years ago

@iafan This was merged and should be closed.