Segmentation fault accessing attributes

rushter / selectolax

Python binding to Modest and Lexbor engines (fast HTML5 parser with CSS selectors).

MIT License

1.13k stars 69 forks source link

import urllib.request import selectolax with urllib.request.urlopen( "https://rhodes-ltd-339.myshopify.com" ) as response: data = response.read() html = data.decode("utf-8") parser = selectolax.lexbor.LexborHTMLParser(html) for elem in parser.head.iter(): print("tag", elem.tag) print("attributes", elem.attributes) print("done")

Commenting to indicate another case where the lexbor causes segmentation fault but modest works:

Causes segmentation fault:

import selectolax
parser = selectolax.lexbor.LexborHTMLParser("")
for node in parser.root.traverse():
    parent = node.parent.attributes.get("anything")

print("done")

Works as expected:

import selectolax
parser = selectolax.parser.HTMLParser("")
for node in parser.root.traverse():
    parent = node.parent.attributes.get("anything")

print("done")

In lexbor the issue seems to be that when generating html elements the parents of those generated elements won't have .attributes in some cases

rushter / selectolax

Segmentation fault accessing attributes #135