Incorrect handling of <ignorable>...</ignorable> sections placed by CAT tools after segments

mxs17 commented 3 years ago

Hello!

The problem arises when there are trailing whitespaces in the source document segments. When importing, SDL Studio separates them from the text and moves into separate structures. E.g., if the source segment is "Hello ", the exported translated xliff will look like:

   <unit id="key1">
      <segment>
        <source>Hello</source>
        <target>Hi</target>
      </segment>
      <ignorable>
        <source xml:space="preserve"> </source>
        <target xml:space="preserve"> </target>
      </ignorable>
    </unit>

createUnit function of xliff2js saves the last child of the element. When it is <ignorable> rather than <segment>, the resulting source and target are empty as they are copied from <ignorable>. I believe this can be fixed by adding a check for element name.

function createUnit (unit, initValues) {
  // source, target, note
  return unit.elements.reduce((unit, segment) => {
  if (segment.name === 'segment') {   // <------- check if it is a segment and not an ignore section
      segment.elements.forEach((element) => {
        switch (element.name) {
          case 'source':
          case 'target':
          case 'note':
            unit[element.name] = extractValue(element.elements, ElementTypes2)
            break
        }
      })
    }

    return unit
  }, JSON.parse(JSON.stringify(initValues)))
}

adrai commented 3 years ago

Would you like to send a Pull Request to address this? Remember to add unit tests.

mxs17 commented 3 years ago

Okay! I will try.

adrai commented 3 years ago

thank you, released with v5.5.3

locize / xliff

Incorrect handling of <ignorable>...</ignorable> sections placed by CAT tools after segments #39