matecat / MateCat-Filters

Convert any file to XLIFF and back with perfectly preserved formatting! Super easy API, plenty of supported formats and advanced segmentation.
http://filters.matecat.com
GNU Lesser General Public License v3.0
45 stars 32 forks source link

Filters doesn't preserve trailing whitespaces inside CDATA #13

Closed LLCampos closed 8 years ago

LLCampos commented 8 years ago

Consider this XML:

<xml>
    <element><![CDATA[This an example xml

    ]]></element>
</xml>

It has a lot of whitespaces. The problem is that when I convert it to XLIFF using Matecat Filters, when I try to convert it back to the original format using the translation, all the trailing whitespace disappear. This does not happen when the text is not inside CDATA.

The trailing spaces are also deleted in case of text inside tags inside CDATA. For example:

<xml>
    <element>
        <![CDATA[<tag>This an example xml </tag>]]>
    </element>
</xml>

The XLIFF resulting does not have that trailing whitespace in the end of the text. This does not happen when the <tag>This an example xml </tag> is not surrounded by CDATA.

LLCampos commented 8 years ago

I've updated the issue with another example.

giusilvano commented 8 years ago

@LLCampos thank you, after studying a bit the issue I pushed a new commit in the dev branch that should fix it. The XML Filter was using the default Okapi HTML filter to subfilter the CDATA parts. With the new changes the default MateCat HTML configuration will be used instead, and it preserves spaces.

LLCampos commented 8 years ago

Nice! :)