pantherman594 / BungeeEssentials

Full customization of a few necessary features for your server!
http://www.spigotmc.org/resources/bungeeessentials.1488/
GNU General Public License v3.0
6 stars 8 forks source link

Poorly Formated Regex String (Rules Module, Anti-Advertisment) #36

Closed babel5405 closed 8 years ago

babel5405 commented 8 years ago

The string: ^.+((.|dot).+)(.|dot)\s[a-zA-Z]{2,5} which is provided on line 24 in the default messages.yml file matches far more than it should. Specifically it matches any instance where there is a period, with content after it.

Examples include: http://url.com/ url.com this is a url. com this is a sentence. which shouldn't match

This results in annoyed players and staff who have to work around the poorly formatted regex. Instead using this string will result in better matches, though it could leave out instances where a user is trying to bypass the system.

^.+((.|dot).+)(.|dot)[^\s][a-zA-Z]{2,5}

This string will ignore anything with white space after the . allowing messages with multiple sentences again. While not a perfect filter, it should stop most instances where the filter is catching more than its actually supposed to.

pantherman594 commented 8 years ago

Would this work better? \b(([\w\d]{2,})\s*.(\.|d.t).{0,5})+(net|com|xyz|me|org|site|pw|top|io|co|biz)

babel5405 commented 8 years ago

In theory yes, the issue with that is the sheer number of TLD's now available, so while that covers most domains you will still miss a quite a large number of them with that. I think the system you're using to detect structure rather than content is good, it just needs a bit more refinement. Consider comparing for three letters after the dot followed by a / or whitespace, as well as another dot to handle co.uk domains and the like.

pantherman594 commented 8 years ago

If your main concern is blocking Minecraft servers, I think it should be good enough. I doubt large servers would use any obscure TLDs. It wouldn't work with small servers that use number IPs or some little-used TLD. In my opinion, the best way to prevent advertisers would be to include both \b(([\w\d]{2,})\s*.(\.|d.t).{0,5})+(net|com|xyz|me|org|site|pw|top|io|co|biz) to block most of them with fairly low bypass-ability and ^.+((\.|dot).+)*(\.|dot)[^\s]*[a-zA-Z]{2,5} to catch the other ones, though it can be more easily bypassed.

babel5405 commented 8 years ago

That makes sense to me. A combination of the two would be more robust than trying to simply catch everything.

pantherman594 commented 8 years ago

I'll add both into the next update. Thanks for bringing up this issue!