Closed po6ix closed 2 years ago
Thanks for the report.
Quoting the readme:
Per the design, it intends to parse massive HTML files in lowest price, thus the performance is the top priority. For this reason, some malformatted HTML may not be able to parse correctly
Considering that the library takes input HTML and produces a resulting set of nodes, XSS isn't really in the purview of this library. (How would we even distinguish between what was user input and what is not?) If taking user input to with the intention of threading it into HTML, the responsibility of validating that input falls on the software which preceeds the call to this library. The library's purpose to to provide a minimally manipulatable DOM tree for provided HTML at minimal cost.
Use of this library may be the right choice if there is a reasonable expectation of having valid HTML. If needs extend beyond this, using another library is advisable.
Details in here: https://blog.p6.is/writeups-for-hayyim-security-ctf-2022/#Solution-5
The problem is node-html-parser parse the unclosing tag and
\x0b
wrongly. And It can leads to a kindof mutation xss even with sanitized html string.