Open GoogleCodeExporter opened 8 years ago
The spec says that content must be in UTF-8 or UTF-16.
lxml can handle UTF-16, so we should be able to as well:
>>> from lxml import etree
>>> test = u'<foo>bar</foo>'
>>> etree.XML(test)
<Element foo at 1678300>
>>> etree.XML(test.encode('utf-8'))
<Element foo at 1678210>
>>> etree.XML(test.encode('utf-16'))
<Element foo at 1678360>
>>> etree.XML(test.encode('utf-32'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
LookupError: unknown encoding: utf-32
1. Create test case where content is in UTF-16 and verify that this works
2. Add error messaging to handle other UnicodeDecodeExceptions as likely some
broken
(i.e. Windows) encoding
Original comment by liza31337@gmail.com
on 11 Sep 2008 at 3:14
(assuming of course that the encoding declaration is right)
Original comment by liza31337@gmail.com
on 11 Sep 2008 at 3:17
Original issue reported on code.google.com by
liza31337@gmail.com
on 10 Sep 2008 at 2:09