propublica / upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
MIT License
1.62k stars 113 forks source link

HTML Comment on stashed pages with info #33

Closed jeremybmerrill closed 10 years ago

jeremybmerrill commented 10 years ago

I had an Upton feature suggestion that would help with large scrapes like this. Would it be possible when writing the scraped html out to the local copy to add some metadata about the page to the top in html note format? Something like <!----Retrieved by Upton from http://www.somesite.com on January 15 at 4:28 p.m.--> That way you could preserve some information about the file even with human readable filenames disabled.

Suggests @esagara

jeremybmerrill commented 10 years ago

@esagara, check it out.