wilsonzlin / minify-html

Extremely fast and smart HTML + JS + CSS minifier, available for Rust, Deno, Java, Node.js, Python, Ruby, and WASM
MIT License
842 stars 36 forks source link

Unbounded tree depth causes eventual segfault due to excessive stack frames #159

Open kevinhu opened 11 months ago

kevinhu commented 11 months ago

Hi! This is a super useful tool. I'm using this package for a scraper, and I ran into a segfault when running minify_html 0.11.1 (default settings) with this particular website: https://gist.github.com/kevinhu/1c60437a9cecf3b8c741c3f006d35b8f

To reproduce:

import minify_html

with open("./bad_website.html", "r") as f:
    long_html = f.read()

    minified = minify_html.minify(long_html)

I also tried using minify_html_onepass, which fails gracefully with the following error:

SyntaxError: Closing tag name does not match opening tag (expected "span", got "a"). [Character 2824653]
wilsonzlin commented 9 months ago

I did a quick test and this appears to be due to an extremely deep tree; there were over 4000 stack frames before the segfault. This isn't really solvable but I could implement a feature to limit the max parse depth.