sabre-io / xml

sabre/xml is an XML library that you may not hate.
http://sabre.io/xml/
BSD 3-Clause "New" or "Revised" License
516 stars 77 forks source link

no empty braces when empty or null the namespaceURI #182

Closed leli1337 closed 2 years ago

staabm commented 4 years ago

could you elaborate a bit what you are trying to achieve with this change and what the problems looks like you are trying to fix?

leli1337 commented 4 years ago

@staabm Consider an xml file called somexml.xml with this content:

<?xml version="1.0"?>
<books>
    <book>
        <title>Snow Crash</title>
        <author>Neil Stephenson</author>
    </book>
    <book>
        <title>Dune</title>
        <author>Frank Herbert</author>
    </book>
</books>

If you parse it and then convert to XML again (the following code) will result in an xml different from the initial.

$path = 'somexml.xml';
$fileStream = fopen($path, 'rw');
$xmlService = new Service;
$result = $xmlService->parse($fileStream);
$xml = $xmlService->write('root', $result);
<?xml version="1.0"?>
<root>
    <book xmlns="">
        <title xmlns="">Snow Crash</title>
        <author xmlns="">Neil Stephenson</author>
    </book>
    <book xmlns="">
        <title xmlns="">Dune</title>
        <author xmlns="">Frank Herbert</author>
    </book>
</root>

With the change made to code both the input and output will be the same.

staabm commented 4 years ago

could you turn you excellent example into a unit test?

leli1337 commented 4 years ago

could you turn you excellent example into a unit test?

Yes, I will create a new unit test for Reader.

staabm commented 4 years ago

Per method comment of the changed method in https://github.com/sabre-io/xml/pull/182/files#diff-3a8e508b4baf36a62dd9aca54f4c6387R30 I guess the fix is at the wrong spot

evert commented 4 years ago

Yes, the getClarkNotation shouldn't change. The change should be in the writer, not the reader. It should also consider the fact that a user might write an XML file with namespaces, but it could still add elements in the "empty" namespace with a prefix. So this might not be a super easy fix, but still doable I think.

It should be noted that the output is correct, it's just ugly

leli1337 commented 4 years ago

@staabm @evert The problem is not in writer. Consider the following code

The php array from parsing the xml example i provided is: As you can see the name contains {} concatenated.

[ 
  [
    'name' => '{}book',
    'value' => [
      [
        'name' => '{}title',
        'value' => 'Snow Crash',
        'attributes' => [],
      ],
      [
        'name' => '{}author',
        'value' => 'Neil Stephenson',
        'attributes' => [],
      ],
    ],
    'attributes' => [],
  ],
  [
    'name' => '{}book',
    'value' => [
      [
        'name' => '{}title',
        'value' => 'Dune',
        'attributes' => [],
      ],
      [
        'name' => '{}author',
        'value' => 'Frank Herbert',
        'attributes' => [],
      ],
    ],
    'attributes' => [],
  ],
];

Another example showing the right behavior of writer is this:

$array = [
    'name' => 'test',
    'value' => [
        [
            'name' => '{}empty-namespace',
            'value' => 'value',
        ], [
            'name' => 'no-namespace',
            'value' => 'value',
        ],
    ],
];

$xmlService = new Service;
$xml = $xmlService->write('root', $array);

When using {} it will output empty namespace and if you do not put it it will not output namespace.

The result xml of the code is:

<?xml version="1.0"?>
<root>
    <test>
        <empty-namespace xmlns="">value</empty-namespace>
        <no-namespace>value</no-namespace>
    </test>
</root>
evert commented 4 years ago

Yes, but that is a design decision.

Removing that can create collisions with arrays that contain a value, name, and attribute key. It was important to have a canonical non-conflicting value for these, and it was a design goal to make roundtripping (reading and subsequently writing the result again) to yield the same XML document (semantically, not byte-for-byte).

Consider the following:

[
  'name' => 'foo',
  'value' => 'bar',
  'attributes' => ['a' => 'b']
}

This could be interpreted as:

<foo a="b">bar</foo>

or:

<name>foo</name>
<value>bar</value>
<attributes><a>b</a></attributes>

By decoding non-namespaced elements to {}name, there's no ambiguity.

I realize that it is a little ugly for non-namespaced XML, but deserializers help with this. They basically give you a hook into every element and output exactly the structure you need.

evert commented 4 years ago

This library actually ships with 1 such deserializer: KeyValue. It will strip namespaces.

Caveat is that it deserializes strictly to a key value structure, so no more arrays with name, value, attributes. With a good combination of deserializers, you can create a very clean output.