Closed GoogleCodeExporter closed 9 years ago
In both Firefox 4 and the Opera Ragnarök build the following DOM is produced
according to http://software.hixie.ch/utilities/js/live-dom-viewer/saved/871:
<!DOCTYPE HTML><html><head></head><body><p><code x<="" code=""></code></p><code
x<="" code="">
</code></body></html>
Original comment by philip.j...@gmail.com
on 7 Mar 2011 at 7:48
It seems like there's some coersion of the attribute values that should happen
that isn't happening in this case, because this input:
<p><code x<=foo></code></p>
Produces this output:
<!DOCTYPE html><p><code xU0003C=foo></code></p>
So perhaps the coersion step is skipped for the original input?
Using this code:
#!/usr/bin/env python
import sys
import html5lib
from html5lib import treebuilders, treewalkers, serializer
doc = html5lib.parse(open("minimized.html"), treebuilder="lxml")
walker = treewalkers.getTreeWalker("lxml")
s = serializer.htmlserializer.HTMLSerializer()
for x in s.serialize(walker(doc)):
sys.stdout.write(x)
Original comment by philip.j...@gmail.com
on 7 Mar 2011 at 8:01
Original comment by philip.j...@gmail.com
on 7 Mar 2011 at 6:51
Phew, there's actually nothing magic about file input, the difference was a
trailing linebreak in the file input. This is enough to reproduce:
import html5lib
html5lib.parse("<p><code x</code></p>\n", treebuilder="lxml")
Original comment by philip.j...@gmail.com
on 8 Mar 2011 at 8:22
James, I now have what seems to be a working fix, could you review it for
sanity? It's trivial, but I don't really understand the relationship between
the Element class in treebuilders/etree.py and treebuilders/etree_lxml.py,
beyond the fact that the latter inherits the former.
The root cause is that the attributes on the underlying etree Element are
coerced, but the attributes on the wrapping Element are not. The cloneNode was
trying to copy the uncoerced attributes of the wrapper Element to an lxml
Element.
Where would it be appropriate to add a test for this?
Original comment by philip.j...@gmail.com
on 10 Mar 2011 at 7:55
Attachments:
Fixed in
http://code.google.com/p/html5lib/source/detail?r=99e8af7f0c486da0f7ca7e570177d8
f7b9f68ed4
The fix is a little different to the patch here.
Original comment by ja...@hoppipolla.co.uk
on 10 Mar 2011 at 10:42
Original issue reported on code.google.com by
philip.j...@gmail.com
on 6 Mar 2011 at 10:29Attachments: