tb0hdan / domains

World’s single largest Internet domains dataset
https://domainsproject.org
BSD 3-Clause "New" or "Revised" License
666 stars 102 forks source link

Crawler causing SYN floods #15

Open apienk opened 2 years ago

apienk commented 2 years ago

Yesterday at 18:17 CEST we noted a SYN flood caused by the project crawler. Please implement request limits.

jordangarrison commented 2 years ago

Also experiencing this issue. We've had to block this project.

kalebdf commented 2 years ago

We experienced a sudden flood of requests as well. We have currently blocked this project (429 Too Many Request). All the best!

tb0hdan commented 2 years ago

Hi guys,

I apologize for this unintended behavior. DomainsProject crawler (https://github.com/tb0hdan/idun) doesn't have any kind of port scanning functionality and uses plain "net/http" library for connections. I am already working on additional limits (on top of existing robots.txt handling):

Thank you very much for reporting this. Issue will remain open for historical purposes after the fix.

Bmess1 commented 2 years ago

Thank you so much for the quick response!

On Jun 30, 2022, at 05:36, Bohdan Turkynewych @.***> wrote:

 Hi guys,

I apologize for this unintended behavior. DomainsProject crawler (https://github.com/tb0hdan/idun) doesn't have any kind of port scanning functionality and uses plain "net/http" library for connections. I am already working on additional limits (on top of existing robots.txt handling):

delay/sleep between requests decreased number of connections to single site HTTP 429 code handling Thank you very much for reporting this. Issue will remain open for historical purposes after the fix.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.

apienk commented 2 years ago

Thanks for the response. We will not blacklist your crawler for now.

tb0hdan commented 2 years ago