spamscanner / url-regex-safe

Regular expression matching for URL's. Maintained, safe, and browser-friendly version of url-regex. Resolves CVE-2020-7661 for Node.js servers.
https://forwardemail.net/docs/url-regex-javascript-node-js
MIT License
79 stars 16 forks source link

Issue with Email Addresses Getting pulled In #13

Closed JimmyGalar closed 2 years ago

JimmyGalar commented 2 years ago

So we allow our end users to send test strings out and those strings can contain a mix of urls and email addresses. We are using a email regex function to pull out all email addresses and validate those emails against a white list that we have. We also do the same for urls using url-regex-safe to pull out URLs from the string and validate against a different white list.

The issue I am encountering is that url-regex-safe is pulling in portions of the email or the email domain.

For example: This is a test of our notification system, for any questions please go to www.test.com/info for further details. To get further information on the process please email test@test.com, or Bob.Smith@test.com.

What url-regex-safe will do is get www.test.com/info, test.com, and Bob.Sm to be evaluated.

Can anything be done to exclude email addresses from the urlRegexSafe function?

niftylettuce commented 2 years ago

You could first parse out emails, and then parse out URLS.

that's what I do in https://spamscanner.net (open source on GitHub too)

On Thu, Aug 19, 2021 at 2:01 PM JimmyGalar @.***> wrote:

So we allow our end users to send test strings out and those strings can contain a mix of urls and email addresses. We are using a email regex function to pull out all email addresses and validate those emails against a white list that we have. We also do the same for urls using url-regex-safe to pull out URLs from the string and validate against a different white list.

The issue I am encountering is that url-regex-safe is pulling in portions of the email or the email domain.

For example: This is a test of our notification system, for any questions please go to www.test.com/info for further details. To get further information on the process please email @., or @.

What url-regex-safe will do is get www.test.com/info, test.com, and Bob.Sm to be evaluated.

Can anything be done to exclude email addresses from the urlRegexSafe function?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/niftylettuce/url-regex-safe/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD7XBOAH2ZF3RXC4M6NOTTT5VIKNANCNFSM5CO2POTA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

JimmyGalar commented 2 years ago

Am doing that to get around what I found above, just was hoping there was a way to tweak url-regex-safe to just exclude the emails versus the parsing am doing.

pjotrsavitski commented 2 years ago

I just had a few cases that has both emails and URLs in them. Everything worked fine and email addresses were left alone. I did use it like urlRegexSafe({ strict: true }). This should disregard emails.

niftylettuce commented 2 years ago

If you had tests to add that fail please add!