Open abclution opened 3 years ago
iptables -L
Chain f2b-apacherepeatoffender (1 references) DROP all -- crawl-66-249-64-151.googlebot.com anywhere
globalblacklist.conf contains the goodbot setting BrowserMatchNoCase "(?:\b)Googlebot(?:\b)" good_bot
also manually added to whitelist-domains
SetEnvIfNoCase Referer ~*googlebot.com good_ref
And its still blocking it. Why?
Rather disable the repeateoffender jail it can be troublesome and just rely on what the blocker does
Hi Mitchell thanks for replying.
Yes, I just realized after I updated it was a problem with the F2B situation, not the ABBB. Derp.
If I keep getting the issue, I'll give it a try, but the jail really helps ALOT when there are botnets attacking constantly.
What I did for now for better or worse is grabbed the ips (from ABBB global list) for Bing/Google/Cloudflare and added them to jail.local like this.
[DEFAULT]
ignoreip = 108.177.0.0/17 172.217.0.0/16 173.194.0.0/16 2001:4860:4000::/36 203.208.60.0/24 207.126.144.0/20 209.85.128.0/17 216.239.32.0/19 216.58.192.0/19 2404:6800:4000::/36 2607:f8b0:4000::/36 2800:3f0:4000::/36 2a00:1450:4000::/36 2c0f:fb50:4000::/36 35.192.0.0/12 64.18.0.0/20 64.233.160.0/19 64.68.80.0/21 65.52.0.0/14 66.102.0.0/20 66.249.64.0/19 72.14.192.0/18 74.125.0.0/16 131.253.21.0/24 131.253.22.0/23 131.253.24.0/21 131.253.24.0/22 131.253.32.0/20 157.54.0.0/15 157.56.0.0/14 157.60.0.0/16 199.30.16.0/24 199.30.27.0/24 207.46.0.0/16 40.112.0.0/13 40.120.0.0/14 40.124.0.0/16 40.125.0.0/17 40.74.0.0/15 40.76.0.0/14 40.80.0.0/12 40.96.0.0/12 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 104.16.0.0/13 104.24.0.0/14 108.162.192.0/18 131.0.72.0/22 141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 173.245.48.0/20 188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 2400:cb00::/32 2405:8100::/32 2405:b500::/32 2606:4700::/32 2803:f800::/32 2a06:98c0::/29 2c0f:f248::/32
Also, I followed the instructions here: https://serverfault.com/questions/561088/fail2ban-ignoreip-dns-host-example
And grabbed that script and used it as the ignorecommand for the apacherepeatoffender jail. In theory it should work to reverse dns the ip and possible tell F2B to play nice.
ALLOWED_HOSTS = [ ".googlebot.com", ".search.msn.com", ".google.com"] etc
I'll update if I find some success. :)
Ninjaedit: Also am planning to try this if the python screen doesnt work:
https://deeb.me/20180320/how-not-to-ban-googlebot
Thanks!
Server FaultI would like to add ".googlebot.com" to the ignore iplist for fail2ban since the ignoreip explanation mentions DNS host as an accepted input. Is this a proper format? # "ignoreip" can be an IP add...
The python script from the linked stack overflow wouldn't work for me, needed some small modification.
I edited this script a bit later as it had some issues. It only seems to work with reverse pointer style domain lookups
Like, crawl-66-249-64-157.googlebot.com
And I dont know enough about python regular expression to make it more flexible.
#!/usr/bin/env fail2ban-python
# Inspired by apache-fakegooglebot script
#
# Written in Python to reuse built-in Python batteries and not depend on
# presence of host and cut commands
# https://serverfault.com/questions/561088/fail2ban-ignoreip-dns-host-example
import sys
import re
from fail2ban.server.ipdns import DNSUtils, IPAddr
ALLOWED_HOSTS = [
".googlebot.com",
".search.msn.com"]
def process_args(argv):
if len(argv) != 2:
raise ValueError("Please provide a single IP as an argument. Got: %s\n"
% (argv[1:]))
ip = argv[1]
if not IPAddr(ip).isValid:
raise ValueError("Argument must be a single valid IP. Got: %s\n"
% ip)
print("Ip received!")
return ip
def is_allowed_host(ip):
host = DNSUtils.ipToName(ip)
print (f"Host is {host}")
if not host:
return False
else:
# m = re.search('.\S+(-\d+)(?P<domain>\.\S+)', host)
m = re.match('.\S+(-\d+)(?P<domain>\.\S+)', host)
print(f"Match: {m}")
### domain = m.group('domain')
try:
domain = m.group('domain')
print(domain)
except:
# print(domain)
return False
if domain in ALLOWED_HOSTS:
print("True")
return True
else:
print("FALSE")
return False
if __name__ == '__main__': # pragma: no cover
try:
ret = is_allowed_host(process_args(sys.argv))
except ValueError as e:
sys.stderr.write(str(e))
sys.exit(2)
print(f"Return {ret}")
sys.exit(0 if ret else 1)
iptables -L
DROP all -- crawl-66-249-66-55.googlebot.com anywhere DROP all -- crawl-66-249-66-53.googlebot.com anywhere DROP all -- crawl-66-249-66-51.googlebot.com anywhere DROP all -- crawl-66-249-66-55.googlebot.com anywhere DROP all -- crawl-66-249-66-53.googlebot.com anywhere DROP all -- crawl-66-249-66-51.googlebot.com anywhere
As read here: https://developers.google.com/search/docs/advanced/crawling/verifying-googlebot?visit_id=637564291852388228-1847563569&rd=1
From what I read the best way to NOT block Googlebot now, it to make sure the Reverse DNS includes the googlebot.com domain.
Bit new to the AUBBB so I'm not 100% sure its not something I did, but other than update the global blacklist, I didnt change anything and it wasn't blocking Googlebots before. I am only running AUBBB for a week and a half and I'm still learning
I grabbed the logs but nothing really stands out, just resetting the ban for now and seeing if it happens again.
66.249.66.55 - - [02/May/2021:15:05:10 +0300] "GET /index.php/default/shop-by-brand/rotair/air-compressors.html HTTP/1.1" 403 407 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.130 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
Also, PS to Mitchell and other contributors, amazing work! Saved my ass, my server has been destroyed by bots.