sangaline / wayback-machine-scraper

A command-line utility and Scrapy middleware for scraping time series data from Archive.org's Wayback Machine.
http://sangaline.com/post/wayback-machine-scraper/
ISC License
423 stars 74 forks source link

Would it be possible to add a functionality to download a screenshot? #10

Closed alexgarciab closed 3 years ago

alexgarciab commented 4 years ago

I would like to know if it would be possible to add a functionality that will download screenshots of each version of a website over the time, based on Wayback Machine data. It would be very useful for us, because checking out screenshots (by a human) is much faster than checking out HTML.

sangaline commented 3 years ago

This would introduce a dependence on a full browser and a browser automation framework like Selenium, and browser rendering generally doesn't play particularly well with Scrapy. I can see the utility in something like this, but it would require a substantial rewrite of the project and a significant shift in philosophy and scope.

If you have a use for a tool like this, my recommendation would be to either search around for something that uses Selenium or Puppeteer already, or to code it from scratch using one of those libraries.