martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON
MIT License
5.45k stars 465 forks source link

Duplicate keys parsing #202

Open ashrafemad opened 5 years ago

ashrafemad commented 5 years ago

When i try to parse an xml tree, if there are many nodes with the same key for example: `

job number 1 job detils

job number 2 job detils number 2

it combines them into oneDictorOrderedDictwithJob` key is there anyway to get rid of this and use each item alone?

snoopyjc commented 5 years ago

The Perl version of this type of module will give you an array of these objects. The python version should return a list in this case.

shouldsee commented 4 years ago

I found this combination to be unfavoured in some case due to the object getting a dynamic type , which could be OrderedDict or List depending on the context).

import xmltodict
import io
x = '''
<Body>
<Job> 
  <x>job number 1</x> 
  <y>job detils</y>
</Job> 
<Job> <x>job number 2</x> <y>job detils number 2</y> 
</Job>
</Body>
'''.strip()
print(type(xmltodict.parse(x)['Body']['Job']))
# <class 'list'>
x = '''
<Body>
<Job> 
  <x>job number 1</x> 
  <y>job detils</y>
</Job> 
</Body>
'''.strip()
print(type(xmltodict.parse(x)['Body']['Job']))
# <class 'collections.OrderedDict'>

I would suggest to control the behaviour using a keyword "merge_duplication".

The most simple solution would be forcing the type to be a list but this means we would have to write x['Body'][0]['Job'][0]['Name'][0] which is quite ugly (but useful and stable)..

shouldsee commented 4 years ago

Ahh found an option called force_list

        If called with force_list=('interface',), it will produce
        this dictionary:
        {'servers':
          {'server':
            {'name': 'host1',
             'os': 'Linux'},
             'interfaces':
              {'interface':
                [ {'name': 'em0', 'ip_address': '10.0.0.1' } ] } } }
        `force_list` can also be a callable that receives `path`, `key` and
        `value`. This is helpful in cases where the logic that decides whether
        a list should be forced is more complex.
edenilson-carvalho commented 3 years ago

Hey guys, do you have any suggestions to solve this issue? using a dictionary, it's not possible because the behavior it's like a hash_table.

SplinterHead commented 1 year ago

Hey guys, do you have any suggestions to solve this issue? using a dictionary, it's not possible because the behavior it's like a hash_table.

This has been discussed before, #14 has a lengthy discussion

If a node (in this example, Job) has a single child, then xmltodict will return a dict If a node has multiple children, xmltodict will return a list

The best way to normalise this behaviour would be the use of the force_list arg to make sure a list is always returned.

x = '''
<Body>
<Job> 
  <x>job number 1</x> 
  <y>job detils</y>
</Job> 
</Body>
'''.strip()
print(type(xmltodict.parse(x, force_list='Job')['Body']['Job']))
# <class 'list'>