rayd / html-parse-stringify2

Parses well-formed HTML (meaning all tags closed) into an AST and back. quickly.
21 stars 11 forks source link

Regular Expression Denial of Service possible #26

Open andyedwardsibm opened 3 years ago

andyedwardsibm commented 3 years ago

https://snyk.io/vuln/SNYK-JS-HTMLPARSESTRINGIFY2-1079307 has been raised for a ReDoS vulnerability, along with CVE-2021-23346. The vulnerability is at https://github.com/rayd/html-parse-stringify2/blob/dbf026f9010a14167e5a9e8589464f660233e446/lib/parse.js#L2

There is a recent fix in the original repo this was forked from. Could the same fix be applied here?

dnsmob commented 3 years ago

i'm sort of wondering if this repo is sorta abandoned or even if it's sort of relevant? as per the description, This could be temporary - I'll gladly drop this if activity picks back up on the original repo. while the original has got it fixed..

andyedwardsibm commented 3 years ago

I fear you're right

The CVE is "applicable", as the following command does not return in a timely manner...

node -e "const p=require('./index.js'); p.parse(\"<\!'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''\!\"); console.log('done')"

Replacing https://github.com/rayd/html-parse-stringify2/blob/dbf026f9010a14167e5a9e8589464f660233e446/lib/parse.js#L2 with /(?:<!--[\S\s]*?-->|<(?:"[^"]*"|'[^']*'|[^'">])+>)/g returns in milliseconds.

I'd offer a PR in the hope that this repo is still maintained, but I can't get the test case in https://github.com/HenrikJoreteg/html-parse-stringify/commit/c7274a48e59c92b2b7e906fedf9065159e73fe12 to ever fail. For some reason, running under test, the RegEx fails to match much more quickly than when calling the parse code directly :confused:

For now, I guess that anyone else that searches for this CVE will end up here as well

dnsmob commented 3 years ago

i did the same and tests on this repo fail to match the html tag -- of course, it's not a "tag", it thinks it's valid <''>. so this test

    var tag = "<!''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''!"
    t.deepEqual(parseTag(tag), {
        type: 'tag',
        attrs: {},
        name: '',
        voidElement: false,
        children: []
    });

expects name to be \'\' which is kinda silly and probably means that that regex doesnt really do the job.

i did a hot swap of this module with node-html-parser in react-i18next as it has a parse function too, but it needs to be compiled to work as a dependency for me. on the bright side, this node parser doesnt have vulns 😃

andyedwardsibm commented 3 years ago

Just want to check what you mean above when you say

this node parser doesnt have vulns

If I run the following snippet, it consumes a CPU at 100% and doesn't return, therefore html-parse-stringify2 is vulnerable to the RegEx DoS attack...

require('html-parse-stringify2').parse("<!'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''!")
dnsmob commented 3 years ago

i meant node-html-parser doesnt have vulnerabilities and on a hot swap it seems to work (but tbh i havent checked the source code to see if it could be used as a replacement for html-parse-stringify2, i just did a simple test).

andyedwardsibm commented 3 years ago

Ah gotcha - thanks :smiley:

kachkaev commented 3 years ago

node-html-parser 2.0.1 and 2.0.2 cannot be used via npm unfortunately. Reason: https://github.com/HenrikJoreteg/html-parse-stringify/issues/42