sanand0 / xmljson

xmlsjon converts XML into Python dictionary structures (trees, like in JSON) and vice-versa.
MIT License
121 stars 33 forks source link

Handling Self Closing Tags #37

Closed dvilajeti01 closed 5 years ago

dvilajeti01 commented 5 years ago

How can I treat self closing tags or empty elements as "element": "" instead of "element": {}?

dvilajeti01 commented 5 years ago

Fixed

Original:

if root.text and self.text_content is not None: text = root.text.strip() if text: if self.simple_text and len(children) == len(root.attrib) == 0: value = self._fromstring(text) else: value[self.text_content] = self._fromstring(root.text)

Revised:

if root.text and self.text_content is not None: text = root.text.strip() if text: if self.simple_text and len(children) == len(root.attrib) == 0: value = self._fromstring(text) else: value[self.text_content] = self._fromstring(root.text) else: if self.simple_text and len(children) == len(root.attrib) == 0: value = self._fromstring(None)

sanand0 commented 5 years ago

@dvilajeti01 -- which convention would you like this for? The reason is that the conventions we use already have a standard for self-closing tags. For example: <root><p/></root> converts to:

abdera      {"root": {"p": {}}}
badgerfish  {"root": {"p": {}}}
cobra       {"root": {"attributes": {}, "children": [{"p": {"attributes": {}}}]}}
gdata       {"root": {"p": {}}}
parker      {"p": null}
yahoo       {"root": {"p": {}}}

Which convention would you like to change to an empty string, please? I'd also need to refer to the original standards for these to see if this is allowed or not.

sanand0 commented 5 years ago

Sorry, I realized you meant the Yahoo convention. The convention does say that "Simple XML elements (elements that contain only content) become string/value pairs." By this logic, <element/> should become {"element": ""} and not {"element": {}}

I'll add a fix for this and push it.

dvilajeti01 commented 5 years ago

@sanand0 Hey so I just ran the new code and once I tried to run it I ran into an error:

Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 151, in _run_module_as_main mod_name, loader, code, fname = _get_module_details(mod_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 109, in _get_module_details return _get_module_details(pkg_main_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 101, in _get_module_details loader = get_loader(mod_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 464, in get_loader return find_loader(fullname) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 474, in find_loader for importer in iter_importers(fullname): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 430, in iter_importers __import__(pkg) File "/Users/danielvilajeti/enviroments/xml_converter/lib/python2.7/site-packages/xmljson/__init__.py", line 57 except (TypeError, ValueError): ^ SyntaxError: invalid syntax

sanand0 commented 5 years ago

@dvilajeti01 I'm unable to reproduce this in any version of Python. Could you please try installing a dev version:

git clone https://github.com/sanand0/xmljson
cd xmljson
pip uninstall xmljson
pip install -e .

... and see if this problem persists?

dvilajeti01 commented 5 years ago

Ok just installed the dev version and the conversion worked successfully

dvilajeti01 commented 5 years ago

There seems to be another bug

so given 0 the output is "tag": "" rather than "tag": 0

dvilajeti01 commented 5 years ago

This is the change I propose to make I don't know if you got a chance to look at it

 if root.text and self.text_content is not None:
            text = root.text.strip()
            if text:
                if self.simple_text and len(children) == len(root.attrib) == 0:
                    value = self._fromstring(text)
                else:
                    value[self.text_content] = self._fromstring(text)

        else:
            if self.simple_text and len(children) == len(root.attrib) == 0:
                value = ''
sanand0 commented 5 years ago

Good catch @dvilajeti01 -- thanks.

The fix you proposed doesn't handle the <x key="val"></x>, I think. That's why I took a different approach in 2ecc206. Could you please check if it works now?

dvilajeti01 commented 5 years ago

Ok, so I checked it and it works but a new problem arises. The order of the attributes for some reason seem to change orders. Here are the examples of the differences. So the first bit is the original Yahoo Convention pre any fixes and the latter is the Yahoo Convention post fix.

Pre:

{
  "transSet": {
    "longId": "2019-01-03",
    "periodID": "2",
    "periodname": "Day",
    "shortId": "505",
    "site": "AB123",
    "openedTime": "2019-01-02T06:03:51-05:00",
    "closedTime": "2019-01-03T06:07:45-05:00",
    "startTotals": {
      "insideSales": "5057863.54",
      "insideGrand": "5230398.56",
      "outsideSales": "1380117.57",
      "outsideGrand": "1380117.57",
      "overallSales": "6437981.11",
      "overallGrand": "6610516.13"
    }

Post:

{
  "transSet": {
    "periodname": "Day", 
    "shortId": "505", 
    "periodID": "2", 
    "site": "AB123", 
    "longId": "2019-01-03", 
    "openedTime": "2019-01-02T06:03:51-05:00", 
    "closedTime": "2019-01-03T06:07:45-05:00", 
    "startTotals": {
      "insideSales": "5057863.54", 
      "insideGrand": "5230398.56", 
      "outsideSales": "1380117.57", 
      "outsideGrand": "1380117.57", 
      "overallSales": "6437981.11", 
      "overallGrand": "6610516.13"
    }