Add DOWNLOAD_DELAY=0.5 to Scrapy config

openhatch / oh-bugimporters

Bug importers for the OpenHatch project oh-mainline

https://oh-bugimporters.readthedocs.org/

GNU Affero General Public License v3.0

12 stars 28 forks source link

Add DOWNLOAD_DELAY=0.5 to Scrapy config #109

Closed paulproteus closed 9 years ago

paulproteus commented 9 years ago

At the time of writing, oh-bugimporters has difficulty downloading all the bugs it wants to from github.com.

@ehashman discovered that GitHub throttles API requests after 5000 per hour.

The Scrapy DOWNLOAD_DELAY setting affects only "consecutive pages from the same website", so we should still see a sizeable amount of parallelism in our crawling after this change. However, since this setting applies to all domains, we might still see a general slowdown.

ehashman commented 9 years ago

Looks good to me.