html5lib / gcode-import

Automatically exported from code.google.com/p/html5lib. Purely archival.
Other
7 stars 8 forks source link

2 nested formatting tags causes a TypeError #207

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
{{{
import html5lib
from html5lib import treebuilders

s = '''<html>
<head></head>
<body>
    <font><font></font></font>
</body>
</html>'''

parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder('beautifulsoup'))
r=parser.parse(s)
}}}

The bug appear when two formatting tags are nested.

In this way, html5lib call ''isMatchingFormattingElement'' and test attribute 
length list.
{{{
def isMatchingFormattingElement(self, node1, node2):
  if node1.name != node2.name or node1.namespace != node2.namespace:
    return False
  elif len(node1.attributes) != len(node2.attributes):
    return False
  else:
    attributes1 = sorted(node1.attributes.items())
}}}
where nodeX.attributes are AttrList.

In my case, I fix the probleme by query the length on attributes.items().
{{{
def isMatchingFormattingElement(self, node1, node2):
  if node1.name != node2.name or node1.namespace != node2.namespace:
    return False
  elif len(node1.attributes.items()) != len(node2.attributes.items()):
    return False
  else:
    attributes1 = sorted(node1.attributes.items())
}}}

Maybe the problem is BeautifulSoup (3.2.1)?

Original issue reported on code.google.com by daniel.l...@gmail.com on 30 May 2012 at 1:18

GoogleCodeExporter commented 9 years ago
I think this is similar to this bug in BS4 
https://bugs.launchpad.net/beautifulsoup/+bug/943246

It can be solved the same way it was in BS4 by adding a __len__ method to 
AttrList class. 
https://bazaar.launchpad.net/~leonardr/beautifulsoup/bs4/revision/181

Original comment by chase.sterling@gmail.com on 30 May 2012 at 2:25

GoogleCodeExporter commented 9 years ago
If it's a BeautifulSoup problem then this report can be close.
thx

Original comment by daniel.l...@gmail.com on 30 May 2012 at 2:41

GoogleCodeExporter commented 9 years ago
It's not a beautifulsoup problem, it's a problem with html5lib's beautifulsoup 
treebuilder, just so happens that code, and the bug, was transferred to bs4, 
and has already been fixed there.

Original comment by chase.sterling@gmail.com on 30 May 2012 at 2:47

GoogleCodeExporter commented 9 years ago
BS support is now dropped from html5lib.

Original comment by geoffers on 9 Apr 2013 at 9:03