Open Kochise opened 1 year ago
You can specify the encoding in parse(), the default is utf-8
IANA currently lists 250+ character encodings.
Python natively supports a subset of 109 encodings (plus some Python specific encodings).
You cannot possibly expect xmltodict to know or to guess which one your input uses.
Seems you're right, explicitely passing bytes with BOM works just fine:
import xmltodict
xml = '''<?xml version="1.0"?><test>123</test>'''
xml = xml.encode("utf-8-sig")
out = xmltodict.parse(xml)
print(out) # {'test': '123'}
So maybe the error is somewhere else? Either the file has a different encoding, or the other libs you're using are modifying the string/bytes somehow.
Edit: these work also:
from io import BytesIO, StringIO
b = BytesIO(b'\xef\xbb\xbf<?xml version="1.0"?><test>123</test>')
print(xmltodict.parse(b.read()))
b = StringIO(b'<?xml version="1.0"?><test>123</test>'.decode("utf-8-sig"))
print(xmltodict.parse(b.read()))
Just using https://github.com/twardoch/yaplon :
D:\Pyenv310>xml22yaml -i "d:\Pyenv310\TEST\Alarms.xml" -o "d:\Pyenv310\TEST\Alarms.yaml"
It is failing there :
https://github.com/martinblech/xmltodict/blob/master/xmltodict.py#L378
From there :
https://github.com/twardoch/yaplon/blob/master/yaplon/reader.py#L71
There should be an issue around here :
https://github.com/martinblech/xmltodict/blob/master/xmltodict.py#L341
You can test any XML file with a BOM :
Regards.