mitchellkrogza / apache-ultimate-bad-bot-blocker

Apache Block Bad Bots, (Referer) Spam Referrer Blocker, Vulnerability Scanners, Malware, Adware, Ransomware, Malicious Sites, Wordpress Theme Detectors and Fail2Ban Jail for Repeat Offenders
Other
802 stars 172 forks source link

[ADD/REMOVE] Seekport Crawler is a misbehaving bot. #162

Closed g7morris closed 6 months ago

g7morris commented 2 years ago

Is this an Addition / Removal Request? Addition. Please and thank you!

Please List the User-Agent string or Referrer to be added/removed example: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/webkit-version (KHTML, like Gecko) Silk/browser-version like Chrome/chrome-version Safari/webkit-version

Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)"

Please explain why it should be added/removed

Bot / User Agent doesn't respect current robots.txt, .htaccess, or even Apache bad bot settings what so ever and winds up toppling servers due to insane amounts of constant requests.

We had to block the offending IP at the firewall level.

For Additions: Please include a log sample 3-5 lines is adequate

I've sanitized the institution domain but you get the drift. The IP is real. We're running Apache in a Docker container which works really well. Big fans of your software.

The requests first come through to a Traefik Docker container (reverse-proxy) which then forwards the request to the Apache container running Bad Bot. Typically blocking by user-agent does the trick asap but not this time it appears.

isle-proxy-prod    | 65.21.180.83 - - [13/Oct/2021:17:49:06 +0000] "GET /collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22&f[1]=mods_originInfo_dateCreated_mdt:[1901-01-01T00:00:00Z%20TO%201911-01-01T00:00:00Z] HTTP/1.1" 200 45570 "https://institution.example.org/collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)" 5533 "Host-PathPrefix-cantaloupe-0" "http://192.168.80.9:80" 5456ms
isle-proxy-prod    | 65.21.180.83 - - [13/Oct/2021:17:49:07 +0000] "GET /collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22&f[1]=mods_originInfo_dateCreated_mdt:[1911-01-01T00:00:00Z%20TO%201921-01-01T00:00:00Z] HTTP/1.1" 200 47252 "https://institution.example.org/collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)" 5534 "Host-PathPrefix-cantaloupe-0" "http://192.168.80.9:80" 5193ms
isle-proxy-prod    | 65.21.180.83 - - [13/Oct/2021:17:49:08 +0000] "GET /collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22&f[1]=mods_originInfo_dateCreated_mdt:[1921-01-01T00:00:00Z%20TO%201931-01-01T00:00:00Z] HTTP/1.1" 200 45724 "https://institution.example.org/collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)" 5535 "Host-PathPrefix-cantaloupe-0" "http://192.168.80.9:80" 5119ms
isle-proxy-prod    | 65.21.180.83 - - [13/Oct/2021:17:49:11 +0000] "GET /collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22&f[1]=mods_originInfo_dateCreated_mdt:[1941-01-01T00:00:00Z%20TO%201971-01-01T00:00:00Z] HTTP/1.1" 200 50603 "https://institution.example.org/collections/academic-departments-and-programs-records?islandora_solr_search_navigation=0&f[0]=mods_typeOfResource_ms:%22text%22" "Mozilla/5.0 (compatible; Seekport Crawler; http://seekport.com/)" 5537 "Host-PathPrefix-cantaloupe-0" "http://192.168.80.9:80" 4921ms

Any other important information to consider

Despite adding the following to the blacklist-user-agents.conf, 3/4 of the requests were coming through which is also odd in that some requests were blocked to the site homepage but the more elaborate requests pushed through?

# Custom - 10/13 Seekport Crawler - http://seekport.com/
BrowserMatchNoCase "^(.*?)(\bSeekport\ Crawler\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bseekport\ crawler\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bSeekport \Crawler\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bseekport \crawler\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bseekport\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bSeekport\b)(.*)$" bad_bot
BrowserMatchNoCase "^(.*?)(\bseekport.com\b)(.*)$" bad_bot
mitchellkrogza commented 6 months ago

Seekport was added some time ago, please confirm if you are still having issues with it

g7morris commented 6 months ago

Filed this ticket before the update made on Jan 21, 2022

No worries I just forgot to close this ticket. Thank you for the hard work.