mitchellkrogza / nginx-ultimate-bad-bot-blocker

Nginx Block Bad Bots, Spam Referrer Blocker, Vulnerability Scanners, User-Agents, Malware, Adware, Ransomware, Malicious Sites, with anti-DDOS, Wordpress Theme Detector Blocking and Fail2Ban Jail for Repeat Offenders
Other
4.03k stars 480 forks source link

[BUG] Amazonbot bypasses the script #585

Closed londonuk371 closed 2 months ago

londonuk371 commented 2 months ago

Describe the bug

Amazonbot is by passing the script, yet all others bots are blocked.

access.log:

223.109.252.182 - - [23/Aug/2024:10:06:13 +0000] "GET / HTTP/1.1" 444 0 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"

444: It's working fine for sogou.

52.70.240.171 - - [23/Aug/2024:10:06:01 +0000] "GET /au/shop/Macros HTTP/1.1" 200 15123 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)"

200: Not working for Amazonbot and I have more than 45 000 requests per day by Amazonbot calculated with this command line: cat /var/log/nginx/mywebsite.com-access.log | awk -F'"' '{print $6}' | sort | uniq -c | sort -n | tail -n 10

To Reproduce

It's a fresh install with all updates.

I installed the chrome extension "Custom User-Agent List" https://chromewebstore.google.com/detail/user-agent-switcher-for-c/djflhoibgkdhkhhcedjiklpkjnoahfmg I created a User-Agent String with the same string as bots: Capture d’écran 2024-08-23 à 12 13 18

GPTBot: Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.2; +https://openai.com/gptbot)

Amazonbot: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/600.2.5 (KHTML, like Gecko) Version/8.0.2 Safari/600.2.5 (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)

Sogou: Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)

GPTBot and Sogou are well blocked but not Amazonbot and I don't understand why.

Expected behavior

Amazonbot should be blocked.

Server (please complete the following information):

Ubuntu 22.04

Post output of uname -a here

Linux ubuntu-uk 5.15.0-112-generic #122-Ubuntu SMP Thu May 23 07:48:21 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

nginx version: nginx/1.18.0 (Ubuntu)

Paste output of sudo nginx -V here (paste in between the markers)

nginx version: nginx/1.18.0 (Ubuntu) built with OpenSSL 3.0.2 15 Mar 2022 TLS SNI support enabled configure arguments: --with-cc-opt='-g -O2 -ffile-prefix-map=/build/nginx-zctdR4/nginx-1.18.0=. -flto=auto -ffat-lto-objects -flto=auto -ffat-lto-objects -fstack-protector-strong -Wformat -Werror=format-security -fPIC -Wdate-time -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -flto=auto -ffat-lto-objects -flto=auto -Wl,-z,relro -Wl,-z,now -fPIC' --prefix=/usr/share/nginx --conf-path=/etc/nginx/nginx.conf --http-log-path=/var/log/nginx/access.log --error-log-path=/var/log/nginx/error.log --lock-path=/var/lock/nginx.lock --pid-path=/run/nginx.pid --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --add-dynamic-module=/build/nginx-zctdR4/nginx-1.18.0/debian/modules/http-geoip2 --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_sub_module

londonuk371 commented 2 months ago

Sorry for the misunderstanding, I should have tested the customizable “nginx-ultimate-bad-bot-blocker” configuration before sending this message.

Adding to the file bots.d/blacklist-user-agents.conf :

# ------------
# MY BLACKLIST
# ------------
“~*(?:\b)Amazonbot(?:\b)” 3;

I was able to block this nasty bot that doesn't take into account the robots.txt file and excutes more than 40,000 requests per day on the server!

mitchellkrogza commented 2 months ago

Sorry for the misunderstanding, I should have tested the customizable “nginx-ultimate-bad-bot-blocker” configuration before sending this message.

Adding to the file bots.d/blacklist-user-agents.conf :

# ------------
# MY BLACKLIST
# ------------
“~*(?:\b)Amazonbot(?:\b)” 3;

I was able to block this nasty bot that doesn't take into account the robots.txt file and excutes more than 40,000 requests per day on the server!

The custom config files are your best friend ;)