[fix] Invalid protocols are matched

ghost commented 5 months ago

Describe the bug

Node.js version: v18.18.2

OS version: macOS 14.2.1

Description: If a valid protocol has extra characters preceding it, the extra characters are included in the match.

Actual behavior

urlRegexSafe({ strict: true }).exec("gaewggwhttp://localhost:3000/derp")

Produces a match that contains the entire string, including "gaewgg".

This also happens when strict is set to false.

Expected behavior

The match that is produced does not include "gaewgg"

Code to reproduce

const urlRegexSafe = require('url-regex-safe')
const match = urlRegexSafe({ strict: true }).exec("gaewggwhttp://localhost:3000/derp")
console.log(match)

Checklist

[x] I have searched through GitHub issues for similar issues.
[x] I have completely read through the README and documentation.
[x] I have tested my code with the latest version of Node.js and this package and confirmed it is still not working.

ghost commented 5 months ago

Okay, I see that the matcher for the protocol will optionally match any a-z string followed by :, and then //. I guess I assumed from the docs that a valid protocol was one that was also known (which I realize isn't well-defined in itself). Maybe the answer to this is to just clarify what the definition of a valid protocol is in the docs. I could also see an option to only match a certain set of well-known or user-provided protocols being a solution too.

iim-norse commented 5 months ago

almost same problem with uppercase Site.Com regex dont understand or Site.com will be ite.com

spamscanner / url-regex-safe