Closed McXD closed 1 month ago
The limitation is not 240 PPM in general but 240 PPM per host. This is to protect the target host and to omit complaints from the host owners. This is sufficient to load 14400 pages from one host in one hour, that is mostly much more than the host has to offer.
If you make a wide crawl (i.e. 100 hosts at the same time) then the limitation is 24000 pages per minute. That should be enough...
The limitation is NOT there in case you are running this in an intranet. Then you are the owner of the hosts and you can put as much load on it as you want.
Without the limitation YaCy would be a DoS tool. We do not want that YaCy is used for this. Therefore the limitation per host should stay.
I am using YaCy to index files stored on my own sites, primarily company filings downloaded from EDGAR. When I started the crawling process, I noticed that the speed is capped at 240 pages per minute (PPM). From the 'Load Web Page, Crawl' page, it states:
Since I am crawling my own server, throttling the load is not a concern. How can I remove or adjust this limit to increase the crawl speed?
Any help is appreciated!