mitchellkrogza / nginx-ultimate-bad-bot-blocker

Nginx Block Bad Bots, Spam Referrer Blocker, Vulnerability Scanners, User-Agents, Malware, Adware, Ransomware, Malicious Sites, with anti-DDOS, Wordpress Theme Detector Blocking and Fail2Ban Jail for Repeat Offenders
Other
3.82k stars 472 forks source link

Question - Verifying Googlebot #401

Open 9mido opened 3 years ago

9mido commented 3 years ago

If I install nginx-ultimate-bad-bot-blocker, do I need to do any sort of verification of Googlebot?

https://support.google.com/webmasters/answer/80553?hl=en

Since nginx-ultimate-bad-bot-blocker attempts to block bad bots, Googlebot should not need to be verified since the real Googlebot should technically be allowed after nginx-ultimate-bad-bot-blocker handles everything? Or am I mistaken and still need to do some sort of reverse DNS verification of Googlebot somehow? Not sure how to do that with nginx or in general any tips would be appreciated.

hong823 commented 3 years ago

@9mido I think there's nothing much further to configure to allow GoogleBot after this package installation.

After the package installation, you can try to verify if GoogleBot is not blocked by (You should see 200 HTTP code):

curl -A "Mozilla/5.0 (compatible; GoogleBot/5.2; +http://google.com/)" -I https://yourwebsite.com

Furthermore, you can also check that if bad bots are blocked (You should see 444 HTTP code):

curl -A "Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)" -I https://yourwebsite.com
9mido commented 3 years ago

@hong823

That is what I figured you would not need to do any googlebot verification if you install this nginx-ultimate-bad-bot-blocker package. You never know though if that is truly the case. I have not tested your commands since my site is not in production yet.

But when I see this github package https://github.com/flant/nginx-http-rdns/issues/10 there is a whole complex process of verifying googlebot with the nginx rdns module that is not really beginner friendly and no longer maintained by the developers.

If someone could do a tutorial on this then it would make things a lot easier. If incorporated into this nginx-ultimate-bad-bot-blocker package, even better. The google link I posted leaves it it up to interpretation and am not sure what to do.

mitchellkrogza commented 3 years ago

Googlebot will never be blocked by this blocker unless you block it yourself in the custom includes. Just remember the design of this is a) we block known bad bots by their user-agent string b) we allow good known bots by their user-agent string c) everything else unspecified is allowed as a result

hong823 commented 3 years ago

@9mido I've been running this package in production on my website for months and so far it's working for GoogleBot.

I believe the github package you shared they are trying to build the list of allowable Google's user agents with regex for the package.

Whereas as this package has already done all the work for us. You can see it at

https://github.com/mitchellkrogza/nginx-ultimate-bad-bot-blocker/blob/09b6d10f487cec99fb27ceba2d0aca6952b15b6d/conf.d/globalblacklist.conf#L744-L762

So, it will just work out of the box. As @mitchellkrogza mentioned unless there's some special user agent you needed to block, then those will need to be configured separately.