martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON
MIT License
5.49k stars 462 forks source link

Would it be possible to get the attributes of the root element returned in the dict as well? #319

Open mathijssh opened 1 year ago

mathijssh commented 1 year ago

I'm currently parsing an XML document in the format of

<root_element>
   <node id=1>
      ...
   </node>
   <node id=2>
      ...
   <node>
   ...
</root_element>

stream-parsing it in doc-prescribed format of

def handle_node(event, element):
    ...

I'm interested in all the nodes' data, including their attributes. When parsing this with xmltodict.parse(item_depth=2,item_callback=handle_event ...) I could not find the id attribute in the element dict. After some "debugging" I found that attribute hidden in the event tuple. Is this intended/desired behavior? I personally would prefer to have those attributes as keys in the element dict. Curious to hear any thoughts :)

mpf82 commented 1 year ago

If the item would contain the parent's (root) attributes as keys, then this would break if any node have the same name as a parent's attribute.

import xmltodict

xml = """
<root_element>
   <node id="1"><id x="foo"/></node>
   <node id="2"><id x="bar"/></node>
</root_element>
""".strip()

def handle_item(path, item):
    print("handle_item", path, "/", item)
    return True

xmltodict.parse(xml, item_depth=2, item_callback=handle_item)

Output:

handle_item [('root_element', None), ('node', {'id': '1'})] / {'id': {'@x': 'foo'}}
handle_item [('root_element', None), ('node', {'id': '2'})] / {'id': {'@x': 'bar'}}

Let's look at the first print handle_item [('root_element', None), ('node', {'id': '1'})] / {'id': {'@x': 'foo'}}

Should {'id': {'@x': 'foo'}} be {'id': 1} instead?

Accessing the last item in the root tuple should be easy enough and it will not break things.

mathijssh commented 1 year ago

Thank you, indeed makes sense to prevent such a collision! And as long as the data of the attributes remains available to the user via the path tuple then it's great. Would it be an idea to point users to this in the docs?

Thanks again for the swift and clear reponse!