ssato / python-anyconfig

Python library provides common APIs to load and dump configuration files in various formats
MIT License
278 stars 31 forks source link

xml backend does not follow 'the spec' #62

Open ssato opened 7 years ago

ssato commented 7 years ago

XML backend does not follow 'the spec', http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html such as xmltodict (https://github.com/martinblech/xmltodict) looks so.

In [1]: import anyconfig.backend.xml as X, xml.etree.cElementTree as ET

In [2]: X.elem_to_container(ET.XML("<e/>"), dict, {})
Out[2]: {'e': {}}

In [3]: X.elem_to_container(ET.XML("<e>text</e>"), dict, {})
Out[3]: {'e': 'text'}

In [4]: X.elem_to_container(ET.XML("<e name='value'/>"), dict, {})
Out[4]: {'e': {'@attrs': {'name': 'value'}}}

In [5]: X.elem_to_container(ET.XML('<e name="value">text</e>'), dict, {})
Out[5]: {'e': {'@attrs': {'name': 'value'}, '@text': 'text'}}

In [6]: X.elem_to_container(ET.XML('<e><a>text</a><b>text</b></e>'), dict, {})
Out[6]: {'e': {'@children': [{'a': 'text'}, {'b': 'text'}]}}

In [7]: X.elem_to_container(ET.XML('<e><a>text</a><a>text</a></e>'), dict, {})
Out[7]: {'e': {'@children': [{'a': 'text'}, {'a': 'text'}]}}

In [8]: X.elem_to_container(ET.XML('<e> text <a>text</a></e>'), dict, {})
Out[8]: {'e': {'a': 'text'}}

In [9]:
Pattern XML JSON Access
1 <e/> "e": null o.e
2 <e>text</e> "e": "text" o.e
3 <e name="value" /> "e":{" @name": "value"} o.e["@name"]
4 <e name="value">text</e> "e": { "@name": "value", "#text": "text" } o.e["@name"] o.e["#text"]
5 <e> <a>text</a> <b>text</b> </e> "e": { "a": "text", "b": "text" } o.e.a o.e.b
6 <e> <a>text</a> <a>text</a> </e> "e": { "a": ["text", "text"] } o.e.a[0] o.e.a[1]
7 <e> text <a>text</a> </e> "e": { "#text": "text", "a": "text" } o.e["#text"] o.e.a

from http://www.xml.com/pub/a/2006/05/31/converting-between-xml-and-json.html

ssato commented 7 years ago

Pattern 2, 6 and 7 looks OK (although key for texts are not '#text' but '@text' by default) from the beginning, and now pattern 1 and 5 becomes OK by the commit a96121d and related ones.

In [3]: xml_s = """\
   ...: <?xml version="1.0" encoding="UTF-8"?>
   ...: <config name='foo'>
   ...:   <a>0</a>
   ...:   <b id="b0">bbb</b>
   ...:   <c/>
   ...:   <sect0>
   ...:     <d>x, y, z</d>
   ...:   </sect0>
   ...:   <list1>
   ...:     <item>0</item>
   ...:     <item>1</item>
   ...:     <item>2</item>
   ...:   </list1>
   ...: </config>
   ...: """

In [4]: xml_d = anyconfig.loads(xml_s, ac_parser="xml")

In [5]: xml_d
Out[5]:
{'config': {'@attrs': {'name': 'foo'},
  'a': '0',
  'b': {'@attrs': {'id': 'b0'}, '@text': 'bbb'},
  'c': None,
  'list1': {'@children': [{'item': '0'}, {'item': '1'}, {'item': '2'}]},
  'sect0': {'d': 'x, y, z'}}}

In [6]:

About 3, 4 and 6, I'm not sure that these are worth implementing and do not have side effects.

ssato commented 7 years ago

I was wrong that 6 is not OK, however, I suspect that it might bring unwanted side effects. The commit d42428b brings minor updates:

In [2]: xml_s = """<?xml version="1.0" encoding="UTF-8"?>
   ...: <config name='foo'>
   ...:   <a>0</a>
   ...:   <b id="b0">bbb</b>
   ...:   <c/>
   ...:   <sect0>
   ...:     <d>x, y, z</d>
   ...:   </sect0>
   ...:   <list1>
   ...:     <item>0</item>
   ...:     <item>1</item>
   ...:     <item>2</item>
   ...:   </list1>
   ...:   <list2 id="list2">
   ...:     <item>i</item>
   ...:     <item>j</item>
   ...:   </list2>
   ...: </config>
   ...: """

In [3]: xml_d = anyconfig.loads(xml_s, ac_parser="xml")

In [4]: pprint.pprint(xml_d)
{'config': {'@attrs': {'name': 'foo'},
            'a': '0',
            'b': {'@attrs': {'id': 'b0'}, '@text': 'bbb'},
            'c': None,
            'list1': [{'item': '0'}, {'item': '1'}, {'item': '2'}],
            'list2': {'@attrs': {'id': 'list2'},
                      '@children': [{'item': 'i'}, {'item': 'j'}]},
            'sect0': {'d': 'x, y, z'}}}

In [5]: