spamscanner / url-regex-safe

Regular expression matching for URL's. Maintained, safe, and browser-friendly version of url-regex. Resolves CVE-2020-7661 for Node.js servers.
https://forwardemail.net/docs/url-regex-javascript-node-js
MIT License
79 stars 16 forks source link

Wrong match in html #1

Closed emanuelwo closed 3 years ago

emanuelwo commented 3 years ago

the matching results for this string are wrong

const text = "<div><a href="https://www.test.com/">https://www.test.com/</a></div>"
text.match(urlRegexSafe());

["https://www.test.com/", "https://www.test.com/</a></div>"]

niftylettuce commented 3 years ago

You need to parse and striptags (in advance), like we did in @spamscanner.

Here's an example snippet from that codebase:

https://github.com/spamscanner/spamscanner/blob/1f5645f1a9b53eb3e6fae340e0353b1348f54459/index.js#L954-L963