Closed solaceten closed 1 year ago
Further to my previous messages... here is an example of your misbehaving BOT - trying to access plugin files ? None of which actually exist.
Below is a small example of many thousands of hits, all in the space of a few minutes.
IP:
207.241.230.103
207.241.232.92
207.241.230.131
207.241.232.89
Apache Server Status for 127.0.0.1 (via 127.0.0.1) Server Version: Apache/2.4.57 (cPanel) OpenSSL/1.1.1t mod_bwlimited/1.4Server MPM: preforkServer Built:
Current Time: Friday, 19-May-2023 14:27:49 NZST Server load: 38.38 Total accesses: 31188 - Total Traffic: 773.9 MB - Total Duration: 61212801 CPU Usage: u2.37 s8.75 cu13602.8 cs2568.91 - 38.7% CPU load.746 requests/sec - 19.0 kB/second - 25.4 kB/request - 1962.7 ms/request52 requests currently being processed, 47 idle workers
Srv PID Acc M CPU SS Req Dur Conn Child Slot Client Protocol VHost Request 0-0 4881 0/18/2007 W 15.04 29 0 4353753 0.0 0.76 53.28 207.241.230.103 http/1.1 example.com:443 GET /wp-content/plugins/schema-and-structured-data-for-wp/admin 1-0 8023 0/11/1926 5.43 0 26476 3802995 0.0 0.25 46.69 207.241.232.92 http/1.1 example.com:443 GET /wp-content/plugins/ultimate-social-media-icons/js/shuffle/ 2-0 10855 0/4/2002 2.03 0 33757 4027619 0.0 0.11 51.60 207.241.232.92 http/1.1 example.com:443 GET /wp-content/plugins/counter-number-showcase/assets/css/boot 3-0 11510 0/3/1831 1.46 0 2426 4368181 0.0 0.02 46.56 207.241.230.103 http/1.1 example.com:443 GET /wp-content/uploads/the-core-style.css?ver=1667594027 HTTP/ 4-0 10858 0/4/1760 2.94 2 10255 3437994 0.0 0.08 53.47 207.241.230.131 http/1.1 example.com:443 GET /wp-content/plugins/unyson/framework/extensions/shortcodes/ 5-0 3437 1/31/1768 W 17.35 35 0 3242961 6.7 0.86 48.85 207.241.230.103 http/1.1 example.com:443 GET /wp-content/plugins/tickera-event-ticketing-system/css/elem 6-0 10859 0/3/1731 1.49 0 43765 3219504 0.0 0.07 40.28 207.241.230.103 http/1.1 example.com:443 GET /wp-content/plugins/tickera-event-ticketing-system/css/font 7-0 3438 0/29/1470 18.61 1 10331 3748227 0.0 0.47 36.51 207.241.230.131 http/1.1 example.com:443 GET /wp-content/plugins/wp-data-access/assets/js/wpda_restapi. 8-0 11530 0/3/1463 1.45 0 1446 3189077 0.0 0.02 30.93 207.241.230.103 http/1.1 example.com:443 GET /wp-content/plugins/counter-number-showcase/assets/js/waypo 9-0 8044 0/11/1378 W 10.88 30 0 2774315 0.0 0.30 31.36 207.241.232.89 http/1.1 example.com:443 GET /wp-content/plugins/counter-number-showcase/assets/css/coun 10-0 11531 0/2/1596 _ 1.10 2 14395 2616619 0.0 0.02 45.35 207.241.230.131 http/1.1 example.com:443 GET /wp-content/plugins/animated-number-counters/assets/js/anc- 11-0 11543 0/0/1382 W 0.00 43 0 2836848 0.0 0.00 31.04 207.241.232.90 http/1.1 example.com:443 GET /wp-content/plugins/tickera-event-ticketing-system/css/fron 12-0 2286 0/62/1296 W 40.61 41 0 2375666 0.0 2.09 35.78 207.241.230.103 http/1.1 example.com:443 GET /wp-content/plugins/unyson/framework/extensions/shortcodes/ 13-0 11544 0/0/1152 W 0.00 43 0 2068651 0.0 0.00 26.34 207.241.232.89 http/1.1 example.com:443 GET /wp-content/plugins/counter-number-showcase/assets/css/font 14-0 11545 0/0/1106 W 0.00 42 0 2123248 0.0 0.00 29.86 207.241.230.131 http/1.1 example.com:443 GET /wp-content/plugins/tickera-event-ticketing-system/css/elem 15-0 8047 0/10/874 W 2.97 45 0 1312237 0.0 0.11 19.19 207.241.232.89 http/1.1 example.com:443 GET /wp-content/plugins/wp-data-access/assets/css/wpda_public.c 16-0 11546 0/0/928 W 0.00 42 0 2016284 0.0 0.00 24.29 207.241.232.89 http/1.1 example.com:443 GET /wp-content/plugins/counter-number-showcase/assets/css/boot 17-0 11559 0/0/658 W 0.00 42 0 883667 0.0 0.00 12.52 207.241.230.103 http/1.1 example.com:443 GET /wp-content/plugins/counter-number-showcase/assets/css/font 18-0 11569 0/0/608 W 0.00 42 0 1074856 0.0 0.00 17.50 207.241.232.90 http/1.1 example.com:443 GET /wp-content/plugins/unyson/framework/extensions/shortcodes/ 19-0 11570 0/0/491 W 0.00 42 0 738297 0.0 0.00 11.43 207.241.230.103 http/1.1 example.com:443 GET /wp-content/plugins/tickera-event-ticketing-system/css/font
I am experiencing very high levels of traffic from your IPs.
207.241.230.103 207.241.232.92 207.241.230.131 207.241.232.90 207.241.232.89
I don't want to blacklist them, but will have no choice if we cannot find a resolution
Time: Fri Jun 16 09:56:01 2023 +1200 1 Min Load Avg: 37.34 5 Min Load Avg: 12.08 15 Min Load Avg: 5.72 Running/Total Processes: 46/402
Other people are reporting you as malicious.
https://www.abuseipdb.com/check/207.241.232.90 https://www.abuseipdb.com/check/207.241.230.131 https://www.abuseipdb.com/check/207.241.230.103 https://www.abuseipdb.com/check/207.241.232.92 https://www.abuseipdb.com/check/207.241.232.89
Your developers need to sort this out.
Please contact the Internet Archive directly, as this site is used for collaborative development, and they may miss complaints raised here.
See https://archive.org/about/contact.php or contact info@archive.org
I already contacted them directly and they have not replied in many weeks.
This is the last response I got... Each follow up I send receives nothing.
May 10, 2023, 17:23 PDT I am a server administrator. I have been finding that the web crawler for archive.org has been crawling websites on our servers at a very overwhelming rate - causing servers to become unstable and have very high server load.
I am wondering if your crawler has a robots.txt or equivalent code that will allow us to slow down your crawler speed or crawl rate (similar to the google search engine speed control mentioned here )
Thank you for your assistance. Sol
===
Patron Services Yellow (Internet Archive) Internet Archive support@archivesupport.zendesk.com May 17, 2023, 19:50 PDT
I am very sorry! Can you please tell me which server(s) this is effecting? I will see what we can do. Thanks!
Mark Graham, Director, the Wayback Machine at the Internet Archive
==
Thank you for your reply.
Actually we have many servers and have noticed it across multiple sites.
Is there a robots.txt or equivalent code that will allow us to slow down your crawler speed or crawl rate (similar to the google search engine speed control mentioned here )
Thanks
====
Patron Services Yellow (Internet Archive) Internet Archive support@archivesupport.zendesk.com May 17, 2023, 21:27 PDT
No... I am very sorry but we don't support that. I wish we did! Mark Graham, Director, the Wayback Machine at the Internet Archive Patron Services Yellow
====
May 18, 2023, 4:31 PM Hmmm well that's a bit of a concern.
OK we will have to develop a way to target your crawler and refuse connections after x amount of requests within x amount of time.
Do you have a list of crawler IP addresses or Host Name, User Agent or ASN numbers that your system uses?
Thanks
===
May 18, 2023, 4:31 PM
Further to my previous messages... here is an example of your misbehaving BOT - trying to access plugin files ?
IP:
207.241.230.103
207.241.232.92
207.241.230.131
207.241.232.89
===
June 16, 2023, 4:31 PM Hello again
I did not get any update or reply from you.
I am still experiencing very high levels of traffic from your IPs.
207.241.230.103 207.241.232.92 207.241.230.131 207.241.232.90 207.241.232.89
I don't want to blacklist them, but will have no choice if we cannot find a resolution
Time: Fri Jun 16 09:56:01 2023 +1200 1 Min Load Avg: 37.34
====
June 27, 2023, 10"03 AM I still have not received any response.
====
June 27, 2023, 10"03 AM I still have not received any response.
Just to be clear: I don't work for the Internet Archive.
However, I do know they archive many thousands of sites every day. It is unlikely that they can track down any errant behaviour without more information from you. As they said:
Can you please tell me which server(s) this is effecting?
FWIW, in my experience, the most helpful thing is to have a snippet of your server logs that includes the hosts, URLs, user agents etc.
Failing that, I suggest you look at configuring your web server to use IP-based rate limiting, to keep the volume of requests at a level you are comfortable with.
OK thanks. I did email them with lots of server logs - I just snipped it above for brevity.....
They are extremely poor at responding. I suspect this is simply too hard for them.
We already have rate based limiting and that works fine, but it is a poor show when service providers are attacking servers. So I reached out to them for two reasons 1) to ask if they knew it was happening and 2) to see if they could slow it down....
Fail on both counts.....
Hi there
I am a server administrator. I have been finding that the web crawler for archive.org has been crawling websites on our servers at a very overwhelming rate - causing servers to become unstable and have very high server load.
I am wondering if your crawler has a robots.txt or equivalent code that will allow us to slow down your crawler speed or crawl rate (similar to the google search engine speed control mentioned here )
Thank you for your assistance.