Closed villelahdenvuo closed 5 years ago
Me or @probertson will look into it...
Great, for now I'm relying on a regex to add linebreaks before parsing it.
If I add a linebreak before the
</target>
it parses correctly.
also before </source>
Yes, both elements fail to parse correctly, so it may be an issue with the parser library instead.
I know, why this happens, but I don't know if it's a feature or a bug... It's this line: https://github.com/locize/xliff/blob/master/xml-js/xmlToObject.js#L16 I hope @probertson can help.
If it's a feature it's pretty bad because it's causing me to lose my data when converting formats. :smile: I can look into providing a PR for a fix, but we need to see what problem is that code solving.
Sorry for not joining the conversation sooner -- I was asleep before (time zones 😄) but I'll take a look now.
You're right @adrai, that line is causing the problem:
return valueElement.text.indexOf('\n') === -1 ? valueElement.text : valueElement.text.substr(0, valueElement.text.lastIndexOf('\n'));
I added that line because of a quirk with the tests. It was a weird case where the reader and writer didn't round-trip correctly in some cases.
The short answer is that there are two options:
Here's a long-ish explanation, but to be clear I'm not trying to justify this behavior or say it's correct -- just explain this "feature."
That line was to handle an edge case with the The way the parsing library handles whitespace, it either treats all whitespace as significant or no whitespace as significant. That makes sense, but it causes a problem when an element contains multiple children, specifically:
For example:
<source>
<ph id="name" ctype="x-i18next-param">{{name}}</ph> is
<ph id="how" ctype="x-i18next-param">{{how}}</ph>
</source>
There is a space after the word "is" at the end of the second line, and that space is significant whitespace (the string it represents is "{{name}} is {{how}}"). However, all the other space is not significant whitespace.
In the real world I would probably write out unformatted xml (without line breaks and indentation) so it wouldn't be an issue -- all whitespace would be significant. In this case, to make the test fixture easier to read I left it formatted, and tried to handle the case where there is significant whitespace followed by non-significant whitespace.
I mentioned the solution ideas above. I will start working on approach 2 (fixing the bug), but if you have a strong preference for 1 let me know and I'll remove the conditional and update the test fixtures instead.
Another solution would be if the parsing library allowed fine-grained control over indentation and line breaks, but that's probably overkill for most cases since only humans care about readable XML, not machines 😄
I do not personally like code that is for tests outside of the test folder, but I understand maintaining the formatting of the fixtures is nice. Perhaps the issue could be handled in the tests, for example minifying the xml (non-sensitive spaces) before the comparison?
However I'm fine with solution 2 as long as it's robust enough. I'm sure you can come up with something better than my current regex hack. 😄
The problem in my case is that I use a tool called xliffmerge to copy updated translate units to the locale files and it removes any linebreak at the end of the elements, so even if I had a linebreak after adding new translations it would remove them and then lose the last line of the translation.
fixed in v4.1.3 thanks to the awesome help of @probertson ;-)
I'm trying to parse an XLIFF 1.2 file using the xliff12ToJs function, but for some reason it cuts out the last line of a multiline html translation if the ending target tag is on the same line. Here is a minimal repro: https://runkit.com/embed/6ahusa56f2r5
If I add a linebreak before the
</target>
it parses correctly.