mitchellkrogza / nginx-ultimate-bad-bot-blocker

Nginx Block Bad Bots, Spam Referrer Blocker, Vulnerability Scanners, User-Agents, Malware, Adware, Ransomware, Malicious Sites, with anti-DDOS, Wordpress Theme Detector Blocking and Fail2Ban Jail for Repeat Offenders
Other
3.81k stars 472 forks source link

[HELP REQUIRED] Random UA Strings (Regex) #427

Open mitchellkrogza opened 3 years ago

mitchellkrogza commented 3 years ago

Seeing a lot of these in my logs.

Anyone with a good regex pattern to catch these without causing any false positives elsewhere?

CYIKlzKkVht5
cycR79OhPQ2s
cTSZrlWbrDbO
cFRPT8fDhKAa
bu1O7QgnSOsW
b29qgczOlKlE
b13Z9ANtXRlr
Anib5PbeYsv0
AHqVqsZ40Jyy
a7jbdspvEBSv
9zmPWVROx0rW
9wNFquLmm4qv
9tBMyVd7Rhny
8O8bVK10jHcH
8GuxyjGR3BmF
8ELk2wy2l39P
7yN7Lg9LA8T4
63ySfc6mZdhe
59Lwd8MOorb9
4zfiJwgOu3xi
4P4oijUCSzOs
3two1dSN1QXF
3DQGbHWJwzQA
3dBqL4BQtieX
2z9vrMLycuKK
1XTYOgH89Dd1
1tuh0gDc8ZH5
1I4oK8gTByej
1cJaFjGPbUKj
0wppZpqdDUUP
issuefiler commented 3 years ago

Dumb computers treat Googlebot and tobegoGtlo the same because they all consist of letters. I’d say “nearly impossible in traditional ways.” But let me try:

Detect numbers–letters–numbers (e. g. 1I4oK7gTbyej)

User agents usually don’t have a word that goes numbers–letters–numbers. As they’re random, they have unusual arrangement of characters. When you see 1I4, you flag it as a bad one. You could try numbers–letters–numbers–letters as well to minimize false-positives.

Filter by pronounceability

Usually names in user agents are easy for humans to pronounce, because they should be. Calculate the pronounceabilities of the words with some linguistic magic, and flag those difficult to pronounce as bad ones.

Let a well-trained AI detect them

Train an AI with good user agents and bad ones, over and over, and let it magically flag bad ones.

Or just do nothing for them

because it doesn’t stop attackers with sufficient malicious intention from easily bypass all of the elaborated filters. Would you introduce new filters whenever they lengthen it, shorten it, stick to letters-only, mimic other good bots, put Fortune 500 brand names in their UA?

Once randomized, they need not be fixed — they can be whatever to eliminate your filters.

It’s beyond this blocker’s job and not worth the processing power needed. I’d rather focus on blocking branded bad bots.

mitchellkrogza commented 3 years ago

I gave up on this, most seem to have stopped anyway and anyway anyone can just masquerade as Mozilla/5.0