matecat / MateCat-Filters

Convert any file to XLIFF and back with perfectly preserved formatting! Super easy API, plenty of supported formats and advanced segmentation.
http://filters.matecat.com
GNU Lesser General Public License v3.0
45 stars 32 forks source link

Information lost when converting back to PO #5

Closed LLCampos closed 8 years ago

LLCampos commented 8 years ago

Hi :) I'm having problems when converting from XLIFF files back to PO, using the original2xliff endpoint. Specifically, the problem is about what Filters do to multi-line strings. So, for example, I have this msgstr string:

msgstr ""
"Por favor, active su cuenta. Debería haber recibido un correo electrónico "
"con el asunto \"Bienvenido a Unbabel 'con un enlace de activación."

When I convert to XLIFF and back using the original2xliff endpoint, I get this:

msgstr "Por favor, active su cuenta. Debería haber recibido un correo electrónico con el asunto \"Bienvenido a Unbabel 'con un enlace de activación."

(the string is all on the same line)

So, Filters doesn't maintain the formatting. Is this an intentional behavior?

( I also want to thanks for the great toll you have here :+1: )

giusilvano commented 8 years ago

Hi LLCampos,

Following the PO specification, the two msgstr you wrote down represent the same string.

Quoting the PO spec (https://goo.gl/rRWK1R):

One should carefully distinguish between end of lines marked as ‘\n’ inside quotes, which are part of the represented string, and end of lines in the PO file itself, outside string quotes, which have no incidence on the represented string.

So there's no real information lost. The change in the PO syntax has no effects on the carried information, that is properly maintained.

Unfortunately, even if we try to maintain also the original syntax, in cases like this is impossible to do it. The reason is that the original2xliff endpoint is meant to merge translated strings back in the original format, and there's no way to decide where to split the sentence in the translated language.

The "formatting" we maintain is about bolds, italics and so on in formats like Word documents or HTML. We try to maintain the original file syntax too, but it's not always possible.

(Thanks for the compliment :D )