sanand0 / xmljson

xmlsjon converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.
MIT License
121 stars 33 forks source link

Data mismatch #14

Open jn0 opened 7 years ago

jn0 commented 7 years ago

Look at this example:

    from xml.etree.ElementTree import fromstring
    import xmljson, json
    bf=xmljson.BadgerFish(dict_type=xmljson.OrderedDict)
    q=bf.data(fromstring('<a p="1">x<b r="2">y</b>z</a>'))
    print json.dumps(q,indent=2) # note this item ^ (z)!

Output will be:

    {
      "a": {
        "@p": 1, 
        "$": "x", 
        "b": {
          "@r": 2, 
          "$": "y"
        }
      }
    }

Where is z value?

Tested with

The xmljson was installed via pip.

I'd expect something like

    {
      "a": {
        "@p": 1, 
        "$": "x", 
        "b": {
          "@r": 2, 
          "$": "y"
        },
        "$$": "z"
      }
    }
jn0 commented 7 years ago

Plus, I'd like to preserve XML comments too. Say, under ! "property name" (and "serialize" them the same way: !, !!, !!!, etc):

{
  "!": "comment 1",
  "some": { "more": "JSON here" },
  "!!": "comment 2"
}
sanand0 commented 7 years ago

@jn0 -- on the comments and text fragments (your "z"), the BadgerFIsh convention is silent. There is a bi-directional extension that uses $1, $2, etc for text fragments and !1, !2, etc for comments -- but this is not backward compatible with BadgerFish.

Also, if we did extend this, I'd like it to also work (to the extent possible) for the other conventions we're implementing -- i.e. GData, Yahoo and Parker.

Any thoughts on how you might structure the JSON attributes for these?

jn0 commented 7 years ago

@sanand0 not much actually: I'm a newbie here, in XML land :) But $2 and !2 look no worse than $$ and !! (as well as #2 for CDATA). It looks quite obvious to me that loosing parts of the source isn't good enough anyway. Maybe, just add , bidirectional=False to the BadgerFish constructor and act respectively?

The only point is to grab the parts in traditional dict into a tuple for the non-bidirectional mode, I think. This will loose the exact positions, but still preserve values...

AlexandraBomane commented 7 years ago

Hi @sanand0 !

I have the same problem as @jn0 : some data miss in my json output. I think that there is a problem of recursivity in your parser. Can you have a look on that, please ?

Best, Alexandra

dagwieers commented 7 years ago

This problem also impacts the Abdera and Cobra conventions I implemented. The problem itself is indicated as a TODO (and commented test) in the tests.