html5lib / gcode-import

Automatically exported from code.google.com/p/html5lib. Purely archival.
Other
7 stars 8 forks source link

Non-BMP characters don't serialize correctly in UTF-16 Python #143

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
>>> foo = html5lib.parse("<span>&#x1D4CF;</span>", treebuilder="lxml")
>>> foo.getroot()[1][0].text
u'\U0001d4cf'
>>> html5lib.serialize(foo, "lxml", encoding="ascii")
'<meta charset=ascii><span>&#55349;&#56527;</span>'

Original issue reported on code.google.com by geoffers on 31 Mar 2010 at 1:36

GoogleCodeExporter commented 9 years ago
This bug is also breaking the build of the multipage copy of the spec

Original comment by sideshowbarker on 1 Apr 2010 at 8:37

GoogleCodeExporter commented 9 years ago
Should be fixed in rev. 4113ad9d98a8

Original comment by ja...@hoppipolla.co.uk on 2 Apr 2010 at 5:12