sangaline / wayback-machine-scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
http://sangaline.com/post/wayback-machine-scraper/
ISC License
423 stars 74 forks source link

Inspired by warrick ? #11

Closed sandrobilbeisi closed 3 years ago

sandrobilbeisi commented 4 years ago

does wayback-machine-scraper want to be a replacement for warrick ?

to what extent is it inspired or based on warrick ? http://timetravel.mementoweb.org/

sangaline commented 3 years ago

It's not inspired or based on warrick at all, I had never seen that project until you posted it. Skimming through their README, warrick seems focused on recovering a site at a specific point in time. That behavior is possible with wayback-machine-scraper, but it's also possible to download multiple versions of a page, specify time ranges, and filter based on regular expressions. The core functionality of wayback-machine-scraper is also made accessible through the scrapy-wayback-machine Scrapy middleware, so it's easy to integrate into Python/Scrapy projects. I wouldn't say that wayback-machine-scraper is aiming to be a replacement for warrick or any other project, just that they're projects in similar problem domains with different interfaces and slightly different target use cases.