Closed dvilajeti01 closed 5 years ago
Fixed
Original:
if root.text and self.text_content is not None: text = root.text.strip() if text: if self.simple_text and len(children) == len(root.attrib) == 0: value = self._fromstring(text) else: value[self.text_content] = self._fromstring(root.text)
Revised:
if root.text and self.text_content is not None: text = root.text.strip() if text: if self.simple_text and len(children) == len(root.attrib) == 0: value = self._fromstring(text) else: value[self.text_content] = self._fromstring(root.text) else: if self.simple_text and len(children) == len(root.attrib) == 0: value = self._fromstring(None)
@dvilajeti01 -- which convention would you like this for? The reason is that the conventions we use already have a standard for self-closing tags. For example: <root><p/></root>
converts to:
abdera {"root": {"p": {}}}
badgerfish {"root": {"p": {}}}
cobra {"root": {"attributes": {}, "children": [{"p": {"attributes": {}}}]}}
gdata {"root": {"p": {}}}
parker {"p": null}
yahoo {"root": {"p": {}}}
Which convention would you like to change to an empty string, please? I'd also need to refer to the original standards for these to see if this is allowed or not.
Sorry, I realized you meant the Yahoo convention. The convention does say that "Simple XML elements (elements that contain only content) become string/value pairs." By this logic, <element/>
should become {"element": ""}
and not {"element": {}}
I'll add a fix for this and push it.
@sanand0 Hey so I just ran the new code and once I tried to run it I ran into an error:
Traceback (most recent call last): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 151, in _run_module_as_main mod_name, loader, code, fname = _get_module_details(mod_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 109, in _get_module_details return _get_module_details(pkg_main_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/runpy.py", line 101, in _get_module_details loader = get_loader(mod_name) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 464, in get_loader return find_loader(fullname) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 474, in find_loader for importer in iter_importers(fullname): File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pkgutil.py", line 430, in iter_importers __import__(pkg) File "/Users/danielvilajeti/enviroments/xml_converter/lib/python2.7/site-packages/xmljson/__init__.py", line 57 except (TypeError, ValueError): ^ SyntaxError: invalid syntax
@dvilajeti01 I'm unable to reproduce this in any version of Python. Could you please try installing a dev version:
git clone https://github.com/sanand0/xmljson
cd xmljson
pip uninstall xmljson
pip install -e .
... and see if this problem persists?
Ok just installed the dev version and the conversion worked successfully
There seems to be another bug
so given
This is the change I propose to make I don't know if you got a chance to look at it
if root.text and self.text_content is not None:
text = root.text.strip()
if text:
if self.simple_text and len(children) == len(root.attrib) == 0:
value = self._fromstring(text)
else:
value[self.text_content] = self._fromstring(text)
else:
if self.simple_text and len(children) == len(root.attrib) == 0:
value = ''
Good catch @dvilajeti01 -- thanks.
The fix you proposed doesn't handle the <x key="val"></x>
, I think. That's why I took a different approach in 2ecc206. Could you please check if it works now?
Ok, so I checked it and it works but a new problem arises. The order of the attributes for some reason seem to change orders. Here are the examples of the differences. So the first bit is the original Yahoo Convention pre any fixes and the latter is the Yahoo Convention post fix.
Pre:
{
"transSet": {
"longId": "2019-01-03",
"periodID": "2",
"periodname": "Day",
"shortId": "505",
"site": "AB123",
"openedTime": "2019-01-02T06:03:51-05:00",
"closedTime": "2019-01-03T06:07:45-05:00",
"startTotals": {
"insideSales": "5057863.54",
"insideGrand": "5230398.56",
"outsideSales": "1380117.57",
"outsideGrand": "1380117.57",
"overallSales": "6437981.11",
"overallGrand": "6610516.13"
}
Post:
{
"transSet": {
"periodname": "Day",
"shortId": "505",
"periodID": "2",
"site": "AB123",
"longId": "2019-01-03",
"openedTime": "2019-01-02T06:03:51-05:00",
"closedTime": "2019-01-03T06:07:45-05:00",
"startTotals": {
"insideSales": "5057863.54",
"insideGrand": "5230398.56",
"outsideSales": "1380117.57",
"outsideGrand": "1380117.57",
"overallSales": "6437981.11",
"overallGrand": "6610516.13"
}
How can I treat self closing tags or empty elements as "element": "" instead of "element": {}?