sanand0 / xmljson

xmlsjon converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.
MIT License
122 stars 33 forks source link

Parker convention is not implemented correctly #40

Open AndreiPashkin opened 5 years ago

AndreiPashkin commented 5 years ago

It seems like the way of how the library converts XML using Parker convention differs from how it's defined originally in this repository: https://github.com/doekman/xml2json-xslt

Here is the comparison:

Input data

<root>
    <person>
        <age>12</age>
        <height>1.73</height>
    </person>
    <person>
        <age>12</age>
        <height>1.73</height>
    </person>
</root>

Using xmltodict

Output:

{
    "person": [
        {
            "age": 12,
            "height": 1.73
        },
        {
            "age": 12,
            "height": 1.73
        }
    ]
}

Using XSLT document from the original Google repository

Output:

{
    "root":[
        {
            "age":12,
            "height":1.73
        },
        {
            "age":12,
            "height":1.73
        }
    ]
}
dagwieers commented 5 years ago

So xmljson (this project) returns this:

{
    "person": [
        {
            "age": 12,
            "height": 1.73
        }, {
            "age": 12,
            "height": 1.73
        }
    ]
}

Or this when using preserve_root=True:

{
    "root": {
        "person": [
            {
                "age": 12,
                "height": 1.73
            }, {
                "age": 12,
                "height": 1.73
            }
        ]
    }
}

To me the XSLT translation seems wrong, it removes information (i.e. the person sub-element). Was there a question ?

dagwieers commented 5 years ago

BTW Looking at the examples at https://github.com/doekman/xml2json-xslt/tree/master/unittests your depiction of what xml2json-xslt returns is incorrect, it does not turn the root-element into an element named root.

AndreiPashkin commented 5 years ago

That repository is a reference implementation of Parker convention.

To me the XSLT translation seems wrong, it removes information (i.e. the person element).

I think it makes perfect sense, because in XML you usually have something like:

<people>
  <person>
    <name>John</name>
    <age>10</age>
  </person>
</people>

And in JSON you usually don't want to see {"people": {"person": [...]}} structure but {"people": [...]}.

your depiction of what xml2json-xslt returns is incorrect

I generated output examples using the XSLT file and xsltproc utility.

AndreiPashkin commented 5 years ago

Input: https://github.com/doekman/xml2json-xslt/blob/master/unittests/issue1.xml Output: https://github.com/doekman/xml2json-xslt/blob/master/unittests/issue1.json.expected

Root element seems to be preserved actually

dagwieers commented 5 years ago

Root element seems to be preserved actually

@AndreiPashkin Exactly, so your example above is incorrect. And the output is the same as xmljson.

dagwieers commented 5 years ago

And in JSON you usually don't want to see {"people": {"person": [...]}} structure but {"people": [...]}.

That seems very wrong. Why would someone want to see a sub-element (i.e. person) to disappear when doing a conversion? Why the first sub-element ?

AndreiPashkin commented 5 years ago

That seems very wrong. Why would someone want to see a sub-element (i.e. person) to disappear when doing a conversion? Why the first sub-element ?

  1. Because why would you want to have people -> person structure? It is redundant.
  2. Because that's what developers expect from Parker convention implementation.

@AndreiPashkin Exactly, so your example above is incorrect. And the output is the same as xmljson.

Hm, that is correct. I don't why why xsltproc worked that way. But that's not important for me anyway, I filed the issue because of the above problem.

dagwieers commented 5 years ago
  1. Because why would you want to have people -> person structure? It is redundant.

The redundancy is in the original data, why would a conversion remove a sub-element layer. That doesn't make sense. A conversion has no semantics.

Maybe you were confused or your examples were wrong and you wanted to preserve/remove the root-element. That is possible with xmljson using preserve_root=True/False as my first example showed.

I filed the issue because of the above problem.

But there does not seem to be a problem, so can I close this issue ?

AndreiPashkin commented 5 years ago

The redundancy is in the original data, why would a conversion remove a sub-element layer. That doesn't make sense. A conversion has no semantics.

In the original data there is no redundancy. How else would you express array of person-elements in XML? Only with person-tags. But in JSON equivalent of person-tags are dictionaries themselves.

Also - that's how Parker convention defines it. I think it makes sense to implement a standard according to the standard.