microsoft / XLIFF2-Object-Model

If you’re looking to store localization data and propagate it through your localization pipeline allowing tools to interoperate then you may want to use the XLIFF 2.0 object model. The XLIFF 2.0 object model implements the OASIS Standard for the XLIFF 2.0 specification as defined at http://docs.oasis-open.org/xliff/xliff-core/v2.0/xliff-core-v2.0.html.
Other
86 stars 26 forks source link

<source> with only spaces are emptied on deserialization #11

Closed ysavourel closed 7 years ago

ysavourel commented 8 years ago

It looks like the content of <source> is removed on reading when it is made of spaces. For example if we have this:

String data = "<xliff srcLang='en' version='2.0' xmlns='urn:oasis:names:tc:xliff:document:2.0'>"
    + "<file id='f1'><unit id='u1'>"
    + "<segment><source>Sentence 1.</source></segment>"
    + "<ignorable><source> </source></ignorable>"
    + "<segment><source>Sentence 2.</source></segment>"
    + "</unit></file></xliff>";
using (IO.MemoryStream ms = new IO.MemoryStream(Encoding.UTF8.GetBytes(data)) )
{
    XliffReader reader = new XliffReader();
    XliffDocument doc = reader.Deserialize(ms);
    foreach (XliffElement e in doc.CollapseChildren<XliffElement>() )
    {
        Console.WriteLine("Type: " + e.GetType().ToString());
        if ( e is PlainText )
        {
            PlainText pt = (PlainText)e;
            Console.WriteLine("Content: '" + pt.Text + "'");
        }
    }
}

We get this output (no content for <ignorable>):

Type: Localization.Xliff.OM.Core.File
Type: Localization.Xliff.OM.Core.Unit
Type: Localization.Xliff.OM.Core.Segment
Type: Localization.Xliff.OM.Core.Source
Type: Localization.Xliff.OM.Core.PlainText
Content: 'Sentence 1.'
Type: Localization.Xliff.OM.Core.Ignorable
Type: Localization.Xliff.OM.Core.Source
Type: Localization.Xliff.OM.Core.Segment
Type: Localization.Xliff.OM.Core.Source
Type: Localization.Xliff.OM.Core.PlainText
Content: 'Sentence 2.'

While I would expect this output:

Type: Localization.Xliff.OM.Core.File
Type: Localization.Xliff.OM.Core.Unit
Type: Localization.Xliff.OM.Core.Segment
Type: Localization.Xliff.OM.Core.Source
Type: Localization.Xliff.OM.Core.PlainText
Content: 'Sentence 1.'
Type: Localization.Xliff.OM.Core.Ignorable
Type: Localization.Xliff.OM.Core.Source
Type: Localization.Xliff.OM.Core.PlainText
Content: ' '
Type: Localization.Xliff.OM.Core.Segment
Type: Localization.Xliff.OM.Core.Source
Type: Localization.Xliff.OM.Core.PlainText
Content: 'Sentence 2.'

It happens also for <segment> elements.

RyanKing77 commented 8 years ago

Thanks for reporting this issue. If you would like to contribute a fix, please do so via a pull request. Otherwise, we will evaluate and prioritize the fix as appropriate.

RyanKing77 commented 8 years ago

Upon further examination of the issue, this is by design. If you declare xml:space="preserve" then you will get the expected output. Perhaps you are expecting the default "processing mode" of the OM to be to preserve whitespace? In which case, this is not a bug but a design change.

RyanKing77 commented 8 years ago

To clarify a bit more: this is the default behavior of the OM because it is based on .Net XmlReader/Writer which does not preserve by default.

ysavourel commented 8 years ago

Looking more closely at the XML specification, I have to agree: default means the reader does whatever it wants. I had read it as: "The reader can normalize or preserve". But it seems that complete deletion is valid too.

So it's not a bug.

But I would change this as a request for a change in behavior. While remove spaces on outer content is fine, it seems that completely removing spaces in elements that are content like <source> and <target> is probably unwise. I would expect to either normalize or preserve whitespace there.

I'll also post a not to answer your email in the XLIFF list.

RyanKing77 commented 8 years ago

Thanks for reporting this issue. If you would like to contribute a fix, please do so via a pull request. Otherwise, we will evaluate and prioritize the design change request appropriately.

RyanKing77 commented 7 years ago

Fixed with the following commits https://github.com/Microsoft/XLIFF2-Object-Model/commit/b0dadfe99603fefe599bf5d437cf10f0b03cbe7b https://github.com/Microsoft/XLIFF2-Object-Model/commit/08178a6625b3def5fba7e79378df7e32a8418b63