mitchellkrogza / apache-ultimate-bad-bot-blocker

Apache Block Bad Bots, (Referer) Spam Referrer Blocker, Vulnerability Scanners, Malware, Adware, Ransomware, Malicious Sites, Wordpress Theme Detectors and Fail2Ban Jail for Repeat Offenders
Other
826 stars 181 forks source link

Updated yesterday, 2021-05-13, suddenly seem to be blocking Googlebot. #157

Open abclution opened 3 years ago

abclution commented 3 years ago

iptables -L

DROP all -- crawl-66-249-66-55.googlebot.com anywhere DROP all -- crawl-66-249-66-53.googlebot.com anywhere DROP all -- crawl-66-249-66-51.googlebot.com anywhere DROP all -- crawl-66-249-66-55.googlebot.com anywhere DROP all -- crawl-66-249-66-53.googlebot.com anywhere DROP all -- crawl-66-249-66-51.googlebot.com anywhere

As read here: https://developers.google.com/search/docs/advanced/crawling/verifying-googlebot?visit_id=637564291852388228-1847563569&rd=1

From what I read the best way to NOT block Googlebot now, it to make sure the Reverse DNS includes the googlebot.com domain.

Bit new to the AUBBB so I'm not 100% sure its not something I did, but other than update the global blacklist, I didnt change anything and it wasn't blocking Googlebots before. I am only running AUBBB for a week and a half and I'm still learning

I grabbed the logs but nothing really stands out, just resetting the ban for now and seeing if it happens again.

66.249.66.55 - - [02/May/2021:15:05:10 +0300] "GET /index.php/default/shop-by-brand/rotair/air-compressors.html HTTP/1.1" 403 407 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.130 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Also, PS to Mitchell and other contributors, amazing work! Saved my ass, my server has been destroyed by bots.

abclution commented 3 years ago

iptables -L

Chain f2b-apacherepeatoffender (1 references) DROP all -- crawl-66-249-64-151.googlebot.com anywhere

globalblacklist.conf contains the goodbot setting BrowserMatchNoCase "(?:\b)Googlebot(?:\b)" good_bot

also manually added to whitelist-domains

SetEnvIfNoCase Referer ~*googlebot.com good_ref

And its still blocking it. Why?

mitchellkrogza commented 3 years ago

Rather disable the repeateoffender jail it can be troublesome and just rely on what the blocker does

abclution commented 3 years ago

Hi Mitchell thanks for replying.

Yes, I just realized after I updated it was a problem with the F2B situation, not the ABBB. Derp.

If I keep getting the issue, I'll give it a try, but the jail really helps ALOT when there are botnets attacking constantly.

What I did for now for better or worse is grabbed the ips (from ABBB global list) for Bing/Google/Cloudflare and added them to jail.local like this.

[DEFAULT]

ignoreip = 108.177.0.0/17 172.217.0.0/16 173.194.0.0/16 2001:4860:4000::/36 203.208.60.0/24 207.126.144.0/20 209.85.128.0/17 216.239.32.0/19 216.58.192.0/19 2404:6800:4000::/36 2607:f8b0:4000::/36 2800:3f0:4000::/36 2a00:1450:4000::/36 2c0f:fb50:4000::/36 35.192.0.0/12 64.18.0.0/20 64.233.160.0/19 64.68.80.0/21 65.52.0.0/14 66.102.0.0/20 66.249.64.0/19 72.14.192.0/18 74.125.0.0/16 131.253.21.0/24 131.253.22.0/23 131.253.24.0/21 131.253.24.0/22 131.253.32.0/20 157.54.0.0/15 157.56.0.0/14 157.60.0.0/16 199.30.16.0/24 199.30.27.0/24 207.46.0.0/16 40.112.0.0/13 40.120.0.0/14 40.124.0.0/16 40.125.0.0/17 40.74.0.0/15 40.76.0.0/14 40.80.0.0/12 40.96.0.0/12 103.21.244.0/22 103.22.200.0/22 103.31.4.0/22 104.16.0.0/13 104.24.0.0/14 108.162.192.0/18 131.0.72.0/22 141.101.64.0/18 162.158.0.0/15 172.64.0.0/13 173.245.48.0/20 188.114.96.0/20 190.93.240.0/20 197.234.240.0/22 198.41.128.0/17 2400:cb00::/32 2405:8100::/32 2405:b500::/32 2606:4700::/32 2803:f800::/32 2a06:98c0::/29 2c0f:f248::/32

Also, I followed the instructions here: https://serverfault.com/questions/561088/fail2ban-ignoreip-dns-host-example

And grabbed that script and used it as the ignorecommand for the apacherepeatoffender jail. In theory it should work to reverse dns the ip and possible tell F2B to play nice.

ALLOWED_HOSTS = [ ".googlebot.com", ".search.msn.com", ".google.com"] etc

I'll update if I find some success. :)

Ninjaedit: Also am planning to try this if the python screen doesnt work:

https://deeb.me/20180320/how-not-to-ban-googlebot

Thanks!

Server Fault
fail2ban ignoreip DNS host example?
I would like to add ".googlebot.com" to the ignore iplist for fail2ban since the ignoreip explanation mentions DNS host as an accepted input. Is this a proper format? # "ignoreip" can be an IP add...
abclution commented 3 years ago

The python script from the linked stack overflow wouldn't work for me, needed some small modification.

I edited this script a bit later as it had some issues. It only seems to work with reverse pointer style domain lookups

Like, crawl-66-249-64-157.googlebot.com

And I dont know enough about python regular expression to make it more flexible.


#!/usr/bin/env fail2ban-python
# Inspired by apache-fakegooglebot script
#
# Written in Python to reuse built-in Python batteries and not depend on
# presence of host and cut commands
# https://serverfault.com/questions/561088/fail2ban-ignoreip-dns-host-example
import sys
import re
from fail2ban.server.ipdns import DNSUtils, IPAddr

ALLOWED_HOSTS = [
        ".googlebot.com",
        ".search.msn.com"]

def process_args(argv):
    if len(argv) != 2:
       raise ValueError("Please provide a single IP as an argument. Got: %s\n"
                        % (argv[1:]))
    ip = argv[1]

    if not IPAddr(ip).isValid:
       raise ValueError("Argument must be a single valid IP. Got: %s\n"
                        % ip)
    print("Ip received!")

    return ip

def is_allowed_host(ip):
    host = DNSUtils.ipToName(ip)
    print (f"Host is {host}")
    if not host:
        return False
    else:
#        m = re.search('.\S+(-\d+)(?P<domain>\.\S+)', host)
        m = re.match('.\S+(-\d+)(?P<domain>\.\S+)', host)
        print(f"Match: {m}")
###         domain = m.group('domain')
        try:
          domain = m.group('domain')
          print(domain)
        except:
#          print(domain)
          return False
        if domain in ALLOWED_HOSTS:
           print("True")
           return True
        else:
           print("FALSE")
           return False

if __name__ == '__main__': # pragma: no cover
    try:
      ret = is_allowed_host(process_args(sys.argv))
    except ValueError as e:
      sys.stderr.write(str(e))
      sys.exit(2)

    print(f"Return {ret}")
    sys.exit(0 if ret else 1)