Closed leli1337 closed 2 years ago
@staabm Consider an xml file called somexml.xml with this content:
<?xml version="1.0"?>
<books>
<book>
<title>Snow Crash</title>
<author>Neil Stephenson</author>
</book>
<book>
<title>Dune</title>
<author>Frank Herbert</author>
</book>
</books>
If you parse it and then convert to XML again (the following code) will result in an xml different from the initial.
$path = 'somexml.xml';
$fileStream = fopen($path, 'rw');
$xmlService = new Service;
$result = $xmlService->parse($fileStream);
$xml = $xmlService->write('root', $result);
<?xml version="1.0"?>
<root>
<book xmlns="">
<title xmlns="">Snow Crash</title>
<author xmlns="">Neil Stephenson</author>
</book>
<book xmlns="">
<title xmlns="">Dune</title>
<author xmlns="">Frank Herbert</author>
</book>
</root>
With the change made to code both the input and output will be the same.
could you turn you excellent example into a unit test?
could you turn you excellent example into a unit test?
Yes, I will create a new unit test for Reader.
Per method comment of the changed method in https://github.com/sabre-io/xml/pull/182/files#diff-3a8e508b4baf36a62dd9aca54f4c6387R30 I guess the fix is at the wrong spot
Yes, the getClarkNotation shouldn't change. The change should be in the writer, not the reader. It should also consider the fact that a user might write an XML file with namespaces, but it could still add elements in the "empty" namespace with a prefix. So this might not be a super easy fix, but still doable I think.
It should be noted that the output is correct, it's just ugly
@staabm @evert The problem is not in writer. Consider the following code
The php array from parsing the xml example i provided is: As you can see the name contains {} concatenated.
[
[
'name' => '{}book',
'value' => [
[
'name' => '{}title',
'value' => 'Snow Crash',
'attributes' => [],
],
[
'name' => '{}author',
'value' => 'Neil Stephenson',
'attributes' => [],
],
],
'attributes' => [],
],
[
'name' => '{}book',
'value' => [
[
'name' => '{}title',
'value' => 'Dune',
'attributes' => [],
],
[
'name' => '{}author',
'value' => 'Frank Herbert',
'attributes' => [],
],
],
'attributes' => [],
],
];
Another example showing the right behavior of writer is this:
$array = [
'name' => 'test',
'value' => [
[
'name' => '{}empty-namespace',
'value' => 'value',
], [
'name' => 'no-namespace',
'value' => 'value',
],
],
];
$xmlService = new Service;
$xml = $xmlService->write('root', $array);
When using {} it will output empty namespace and if you do not put it it will not output namespace.
The result xml of the code is:
<?xml version="1.0"?>
<root>
<test>
<empty-namespace xmlns="">value</empty-namespace>
<no-namespace>value</no-namespace>
</test>
</root>
Yes, but that is a design decision.
Removing that can create collisions with arrays that contain a value
, name
, and attribute
key. It was important to have a canonical non-conflicting value for these, and it was a design goal to make roundtripping (reading and subsequently writing the result again) to yield the same XML document (semantically, not byte-for-byte).
Consider the following:
[
'name' => 'foo',
'value' => 'bar',
'attributes' => ['a' => 'b']
}
This could be interpreted as:
<foo a="b">bar</foo>
or:
<name>foo</name>
<value>bar</value>
<attributes><a>b</a></attributes>
By decoding non-namespaced elements to {}name
, there's no ambiguity.
I realize that it is a little ugly for non-namespaced XML, but deserializers help with this. They basically give you a hook into every element and output exactly the structure you need.
This library actually ships with 1 such deserializer: KeyValue. It will strip namespaces.
Caveat is that it deserializes strictly to a key value structure, so no more arrays with name, value, attributes. With a good combination of deserializers, you can create a very clean output.
could you elaborate a bit what you are trying to achieve with this change and what the problems looks like you are trying to fix?