The source website is very susceptible to multiple API requests at the same time. I was wondering,
What is the exact number of these variables in the settings file while you scraped the data,
CONCURRENT_REQUESTS = ?
# Configure a delay for requests for the same website (default: 0)
# See https://docs.scrapy.org/en/latest/topics/settings.html#download-delay
# See also autothrottle settings and docs
DOWNLOAD_DELAY = ?
# The download delay setting will honor only one of:
CONCURRENT_REQUESTS_PER_DOMAIN = ?
CONCURRENT_REQUESTS_PER_IP = ?
How much time did it take to scrape the full data (41M samples)?
Is there anyway you can share the scrapped data? In that way, the source website won't have to respond to millions of API requests.
The source website is very susceptible to multiple API requests at the same time. I was wondering,