Closed ajacob closed 11 months ago
Please provide the original XML file you are trying to translate so I can reproduce the issue.
There are no external dependencies for this project, so there is no need for Maven or a pom.xml
file.
FWIW, the lines of code that you mention are correct. They do exactly what they are supposed to do.
Please provide the source file for testing or close this issue.
Hello @rmraya,
Everything you need to reproduce this issue is provided above.
Here are step by step instructions:
test.skl
and test.xlf
with the content from my first comment (you can simply copy/paste)merge.sh
utility like this: ~/OpenXLIFF/dist/merge.sh -xliff test.xlf -target result.xml
This will produce a result.xml
file with duplicated text (as shown above).
To my understanding the produced result is not valid but if you think everything behaves as expected, feel free to close this issue
@ajacob I need the original XML used to create the XLIFF. I want to perform a full roundtrip: generate XLIFF, translate and merge.
@ajacob You mention that you made changes to the XLIFF, I want to see the version generated by OpenXLIFF
@ajacob:
1) Created a source file like this
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1><G> Bonjour. </G> Ce text devrait être traduit </NIV1>
</ROOT>
2) generated XLIFF with this command:
convert.sh -file original.xml -srcLang fr-FR -tgtLang en-US -2.0 -type XMLG
and obtained this:
<?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr-FR" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en-US" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/Users/rmraya/Desktop/original.xml" id="1">
<skeleton href="/Users/rmraya/Desktop/original.xml.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<segment id="1-0">
<source xml:space="preserve"> Bonjour.</source>
</segment>
<ignorable id="1-1">
<source xml:space="preserve"> </source>
</ignorable>
</unit>
<unit id="2">
<segment id="2">
<source xml:space="preserve"> Ce text devrait être traduit </source>
</segment>
</unit>
</file>
</xliff>
3) Translated like this:
<?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr-FR" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en-US" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/Users/rmraya/Desktop/original.xml" id="1">
<skeleton href="/Users/rmraya/Desktop/original.xml.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<segment state="final" id="1-0">
<source xml:space="preserve"> Bonjour.</source>
<target xml:space="preserve"> Hello.</target>
</segment>
<ignorable id="1-1">
<source xml:space="preserve"> </source>
</ignorable>
</unit>
<unit id="2">
<segment state="final" id="2">
<source xml:space="preserve"> Ce text devrait être traduit </source>
<target xml:space="preserve"> This text should be translated </target>
</segment>
</unit>
</file>
</xliff>
4) Merged the translated file with this command:
merge.sh -xliff original.xml.xlf
and obtained this output file:
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1><G> Hello. </G> This text should be translated </NIV1>
</ROOT>
I don't know what you did, but the code seems to work.
I don't know what you did, but the code seems to work.
I'll try to give you more information / STR, and hopefully help you reproduce this issue.
I've created a very simple sample that involves another tool that add translations into the xlf
file.
input.xml
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1><G>Bonjour.</G> Ceci est un text multi-ligne
qui a pour objectif de mettre en évidence
un simple bug. J'espère que ça aidera.</NIV1>
</ROOT>
config_ROOT.xml
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE ini-file PUBLIC "-//MAXPROGRAMS//Converters 2.0.0//EN" "configuration.dtd" >
<ini-file>
<tag hard-break="segment">NIV1</tag>
<tag ctype="x-bold" hard-break="inline">G</tag>
</ini-file>
./convert.sh -type XML -file input.xml -xliff test.xlf -srcLang fr -tgtLang en-US -skl test.skl -xmlfilter . -2.1
test.xlf
<?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.1" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en-US" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/input.xml" id="1">
<skeleton href="test.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<mda:metadata id="1">
<mda:metaGroup category="attributes" id="ph0">
<mda:meta type="ctype">x-bold</mda:meta>
</mda:metaGroup>
</mda:metadata>
<originalData>
<data id="ph0"><G></data>
<data id="ph1"></G></data>
</originalData>
<ignorable>
<source xml:space="preserve"><ph id="ph0"/></source>
</ignorable>
<segment id="1-0">
<source xml:space="preserve">Bonjour.</source>
</segment>
<ignorable>
<source xml:space="preserve"><ph id="ph1"/></source>
</ignorable>
<segment id="1-1">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
</segment>
<segment id="1-2">
<source xml:space="preserve"> J'espère que ça aidera.</source>
</segment>
</unit>
</file>
</xliff>
test.skl
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1>%%%1%%%
</NIV1>
</ROOT>
This is an important step as the xlf file is modified by another tool that:
<target>
elementstest.xlf
<?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.1" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en-US" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/home/alexandre/hello/input.xml" id="1">
<skeleton href="test.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<mda:metadata id="1">
<mda:metaGroup category="attributes" id="ph0">
<mda:meta type="ctype">x-bold</mda:meta>
</mda:metaGroup>
</mda:metadata>
<originalData>
<data id="ph0"><G></data>
<data id="ph1"></G></data>
</originalData>
<ignorable>
<source xml:space="preserve">
<ph id="ph0"/>
</source>
</ignorable>
<segment id="1-0">
<source xml:space="preserve">Bonjour.</source>
<target>Hello.</target>
</segment>
<ignorable>
<source xml:space="preserve">
<ph id="ph1"/>
</source>
</ignorable>
<segment id="1-1">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
<target>This is a multi-line text that is supposed to demo a simple bug.</target>
</segment>
<segment id="1-2">
<source xml:space="preserve"> J'espère que ça aidera.</source>
<target> I hope this helps.</target>
</segment>
</unit>
</file>
</xliff>
./merge.sh -xliff test.xlf -target result.xml
result.xml
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1>
<G>
Bonjour.
Hello.
</G>
Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug. J'espère que ça aidera.This is a multi-line text that is supposed to demo a simple bug. I hope this helps.</NIV1>
</ROOT>
And as you can see, we now have duplicated content.
To help you better understand the changes in the xlf file, here is the diff (generated by another tool)
--- test.xlf 2023-11-02 23:50:14.943475711 +0100
+++ test.xlf 2023-11-02 23:50:19.660208235 +0100
@@ -26,20 +26,27 @@
<data id="ph1"></G></data>
</originalData>
<ignorable>
- <source xml:space="preserve"><ph id="ph0"/></source>
+ <source xml:space="preserve">
+ <ph id="ph0"/>
+ </source>
</ignorable>
<segment id="1-0">
<source xml:space="preserve">Bonjour.</source>
+ <target>Hello.</target>
</segment>
<ignorable>
- <source xml:space="preserve"><ph id="ph1"/></source>
+ <source xml:space="preserve">
+ <ph id="ph1"/>
+ </source>
</ignorable>
<segment id="1-1">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
+ <target>This is a multi-line text that is supposed to demo a simple bug.</target>
</segment>
<segment id="1-2">
<source xml:space="preserve"> J'espère que ça aidera.</source>
+ <target> I hope this helps.</target>
</segment>
</unit>
</file>
@ajacob
Your sample does not need a custom filter configuration.
I used this command to convert input.xml
to XLIFF:
convert.sh -file input.xml -srcLang fr -tgtLang en -type XMLG -2.0
and got this XLIFF:
<?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/Users/rmraya/Desktop/input.xml" id="1">
<skeleton href="/Users/rmraya/Desktop/input.xml.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<segment id="1">
<source xml:space="preserve">Bonjour.</source>
</segment>
</unit>
<unit id="2">
<segment id="2-0">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
</segment>
<segment id="2-1">
<source xml:space="preserve"> J'espère que ça aidera.</source>
</segment>
</unit>
</file>
</xliff>
I translated the XLIFF with Swordfish and got this translated version:
<?xml version="1.0" encoding="UTF-8" ?>
<xliff srcLang="fr" xmlns:mtc="urn:oasis:names:tc:xliff:matches:2.0" version="2.0" xmlns:mda="urn:oasis:names:tc:xliff:metadata:2.0" trgLang="en" xmlns="urn:oasis:names:tc:xliff:document:2.0">
<file original="/Users/rmraya/Desktop/input.xml" id="1">
<skeleton href="/Users/rmraya/Desktop/input.xml.skl"/>
<mda:metadata>
<mda:metaGroup category="format">
<mda:meta type="datatype">xml</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="tool">
<mda:meta type="tool-id">OpenXLIFF</mda:meta>
<mda:meta type="tool-name">OpenXLIFF Filters</mda:meta>
<mda:meta type="tool-version">3.16.0 20231029_1141</mda:meta>
</mda:metaGroup>
<mda:metaGroup category="PI">
<mda:meta type="encoding">UTF-8</mda:meta>
</mda:metaGroup>
</mda:metadata>
<unit id="1">
<segment state="final" id="1">
<source xml:space="preserve">Bonjour.</source>
<target xml:space="preserve">Hello.</target>
</segment>
</unit>
<unit id="2">
<segment state="final" id="2-0">
<source xml:space="preserve"> Ceci est un text multi-ligne qui a pour objectif de mettre en évidence un simple bug.</source>
<target xml:space="preserve"> This is a multi-line text that aims to highlight a simple bug.</target>
</segment>
<segment state="final" id="2-1">
<source xml:space="preserve"> J'espère que ça aidera.</source>
<target xml:space="preserve"> I hope it will help.</target>
</segment>
</unit>
</file>
</xliff>
Then, I merged the file using this command:
merge.sh -xliff translated.xlf -target translated.xml
and obtained this translated file:
<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
<NIV1><G>Hello.</G> This is a multi-line text that aims to highlight a simple bug. I hope it will help.</NIV1>
</ROOT>
Note that the segments of the XLIFF that you merged are not confirmed.
@rmraya
Your sample does not need a custom filter configuration.
The sample doesn't, the 200KB+ XML file I was working on and on which I noticed the bug does The goal of the sample is to demonstrate the bug with very concise and simple data
When I created this issue, my objective was not to find a workaround or find a way to make the sample work, the purpose was to show you that the merge operation can fail with valid data (not produced by Swordfish).
@ajacob
As far as I can see, the merge operation works as designed. I don't see a bug. The lines of code that you mentioned are doing exactly what is intended, recover text and tags from <source>
when <ignorable>
does not have <target>
.
If you want me to fix anything, provide a real test case that I can use to reproduce the problem.
Hello, thank you for sharing this tool.
I found a problem related to merge operation, I'll try to describe it in this issue
How to reproduce
test.skl
test.xlf
command
actual xml result
expected xml result
Investigation
First, I observed that this issue appears because I formatted the xliff file. If I rollback some changes, the result is correct. Rolled back changes are:
→
(needed as well for
ph1
)Looking into the code, I think I understand the problem.
In https://github.com/rmraya/OpenXLIFF/blob/v3.15.0/src/com/maxprograms/xliff2/FromXliff2.java#L294-L326
We have:
At this point
joinedSource
andjoinedTarget
point to the same content (same java objects per references)Then l. 326 we have:
Which internally calls
joinedSource.getContent();
which itself callsElement#mergeText
.And this is precisely were we have an issue. When harvesting for the source we call the
mergeText
function that mutates someXMLnode
s that are also referenced by the target. After harvesting for the source, the content of target becomes invalid.To confirm this hypothesis, I simply removed the call to
#mergeText
inElement#getContent
and it actually fixes the problem.Solution
I don't know what would be the best to fix this issue, I think there are different possibilities like:
#getContent
to mutate the state ofElement
)equals
method, which can be dangerous as wellElement
objets, in which case that would requires to copyXMLNode
sIf you want me to share a PR with a fix proposal, feel free to ask.
By the way, I think it could be very helpful to introduce a
pom.xml
in order to have maven dependencies (in which case it would be necessary to publishXMLJava
in maven central)