transifex / transifex-old-core

Translation workflow & crowdsourcing for agile teams. Older, open-source codebase of Transifex.com
https://www.transifex.com/
GNU General Public License v2.0
5 stars 5 forks source link

Transifex breaks XML entities #267

Open tomer opened 11 years ago

tomer commented 11 years ago

Please don't escape XML/HTML entities by replacing "&" with "& amp;" automatically. While it could be good for some projects, it breaks every string that contain an entity that should never be re-escaped.

For example, a string that contains "&brandShortName;" should never ever be encoded to "& amp;brandShortName;".

screenshot

NOTE: I've added spaces after the ampersand characters because both are displayed the same the correct and wrong strings displayed the same. Of course, there is no spaced entities in the source files… ☺

tomer commented 11 years ago

This also reproduce on other entities as well. In the screenshot below you can see an entity that contain some HTML markup, which gets destroyed after exporting the strings.

screenshot2

Transifex output is on the left pane, and these strings have imported and exported without being changed by human inside the Transifex site.

tomer commented 11 years ago

Workaround: Run the following bash snippet after pulling the translations to fix double-encoded entities.

find . -type f -name "*.dtd" -exec sed -i "s/\&\([a-zA-Z0-9\.]*\);/\&\1;/g" {} \;

CAUTION: This won't fix HTML tags being broke by Tx into entities as seen in the second screenshot above.

Edit: I've filed a support ticket on this problem here: http://support.transifex.com/customer/portal/questions/1476514

grote commented 10 years ago

I have this problem with Android strings.xml files that use HTML inside strings:

< string name="about" >this is < b >bold< /b >.< /string >

martinbonnin commented 8 years ago

I just bumped into this. The problem is still there.