medialab / sandcrawler

sandcrawler.js - the server-side scraping companion.
http://medialab.github.io/sandcrawler/
GNU Lesser General Public License v3.0
107 stars 12 forks source link

Can i scrape up to 50,000 pages in reasonable time ? #193

Open scroobius-pip opened 7 years ago

scroobius-pip commented 7 years ago

is this library suitable for scraping data of large amount of pages ?

Yomguithereal commented 7 years ago

Hello @scroobius-pip. This library is indeed suitable for scraping a large amount of pages. However, what's a "reasonable time"? Usually, when scraping, the bottleneck is more the sites you are hitting than your own computing power.