google / gumbo-parser

An HTML5 parsing library in pure C99
Apache License 2.0
5.16k stars 663 forks source link

TypeError when getting the tag name of a path element from python #409

Closed ianh closed 1 year ago

ianh commented 5 years ago

The following python code:

import gumbo

def enumerate(node):
    if node.type != node.type.ELEMENT:
        return
    elem = node.v.element
    tag_name = elem.tag_name
    for child in elem.children:
        enumerate(child)

with gumbo.parse(u'<path>') as html:
    enumerate(html.contents.root.contents)

Produces this exception when run:

Traceback (most recent call last):
  File "reduced.py", line 12, in <module>
    enumerate(html.contents.root.contents)
  File "reduced.py", line 9, in enumerate
    enumerate(child)
  File "reduced.py", line 9, in enumerate
    enumerate(child)
  File "reduced.py", line 7, in enumerate
    tag_name = elem.tag_name
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gumbo/gumboc.py", line 280, in tag_name
    return str(original_tag).lower()
TypeError: __str__ returned non-string (type bytes)