martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON
MIT License
5.52k stars 462 forks source link

xml dict is converted into a disordered XML file, which can be unrecoverable #340

Open yangsongbai1 opened 11 months ago

yangsongbai1 commented 11 months ago

When I have the following xml

<w:document xmlns:w="http://www.example.com/wordproc">
    <w:body>
        <w:p>
            abc
        </w:p>
        <w:b>
            def
        </w:b>
        <w:p>
            ghi
        </w:p>
        <w:i>
            gkl
        </w:i>
        <w:p>
            mno
        </w:p>
    </w:body>
</w:document>

It parses into json format like this

{
    "w:document":{
        "@xmlns:w":"http://www.example.com/wordproc",
        "w:body":{
            "w:p":[
                "abc",
                "ghi",
                "mno"
            ],
            "w:b":"def",
            "w:i":"gkl"
        }
    }
}

Restoring again will not guarantee the original order

<w:document
    xmlns:w="http://www.example.com/wordproc">
    <w:body>
        <w:p>abc</w:p>
        <w:p>ghi</w:p>
        <w:p>mno</w:p>
        <w:b>def</w:b>
        <w:i>gkl</w:i>
    </w:body>
</w:document>

I want to parse into the following format, how can I achieve this, I tried the force list argument and it didn't work

{
    "w:document": [
        {
            "@xmlns:w": "http://www.example.com/wordproc"
        },
        {
            "w:body": [
                {
                    "w:p": "abc",
                },
                {
                    "w:b": "def",
                },
                {
                    "w:p": "ghi",
                },
                {
                    "w:i": "gkl",
                },
                {
                    "w:p": "mno",
                }
            ]
        }
    ]
}
pyhedgehog commented 8 months ago

@yangsongbai1 You suggested result will become (after unparse()) something like:

<w:document
    xmlns:w="http://www.example.com/wordproc">
    <w:body><w:p>abc</w:p></w:body>
    <w:body><w:p>ghi</w:p></w:body>
    <w:body><w:p>mno</w:p></w:body>
    <w:body><w:b>def</w:b></w:body>
    <w:body><w:i>gkl</w:i></w:body>
</w:document>

There are related issue #247 that suggests better syntax (special $list key).