martinblech / xmltodict

Python module that makes working with XML feel like you are working with JSON
MIT License
5.49k stars 462 forks source link

Preserve the doctype when parsing and unparsing #351

Open Ravencentric opened 3 months ago

Ravencentric commented 3 months ago

Current behavior: xmltodict discards DOCTYPE.

Expected behavior: xmltodict should keep DOCTYPE

Reproduction:

import xmltodict

xml = """\
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE library PUBLIC "-//Example//DTD Library 1.0//EN" "http://www.example.com/DTD/library.dtd">
<library>
    <book>
        <title>XML Basics</title>
        <author>John Doe</author>
        <published>2020</published>
    </book>
    <book>
        <title>Advanced XML Techniques</title>
        <author>Jane Smith</author>
        <published>2021</published>
    </book>
</library>
"""

parsed = xmltodict.parse(xml)
unparsed = xmltodict.unparse(parsed, pretty=True)
print(unparsed)
# <?xml version="1.0" encoding="utf-8"?>
# <library>
#         <book>
#                 <title>XML Basics</title>
#                 <author>John Doe</author>
#                 <published>2020</published>
#         </book>
#         <book>
#                 <title>Advanced XML Techniques</title>
#                 <author>Jane Smith</author>
#                 <published>2021</published>
#         </book>
# </library>