Closed ErikBorra closed 8 years ago
Thanks, @ErikBorra! Those are both really interesting ideas. Some thoughts/questions:
only retrieve unique snapshots
I'm thinking that using this option would retain only the chronologically-earliest copy. Yeah? And it'd look something like waybackpack dol.gov -d ~/Downloads/dol-dot-gov --unique
? Or --unique-only
?
only retrieve snapshots closest to a particular set of dates (e.g. 1 July of each year)
This is intriguing, but feels like the additional complexity might outweigh the added functionality. What do you think the logic would look like for this? And how would, e.g., "1 July of each year", be expressed as arguments on the command line?
Hi @jsvine,
yes to the first question.
As for the second, one could loop over years (from 1996 until the current year) and specify the following as the datestamp when calling the Wayback API: YEAR0701000000. This way one can retrieve a single version per year, closest to 1 July (the Wayback machine does the 'closest' match for you).
And a third option: get one archived version per month, 10days, or 1 day by using collapse=timestamp:6, collapse=timestamp:7, collapse=timestamp:8 respectively.
Re: Question 1 - Not sure how technically complex this would be, but if the script were to pull down the first complete copy of the site and then in subsequent folders pull down only files that are different, that would again be useful to my case of wanting a complete archive of my old sites.
Re: Question 1 - should be really simple, by specifying showDupeCount=true when calling the API.
Re: third option, and in addition to Re: Question 1: The collapse param can be used to further filter on month or days.
Thanks again for these suggestions! Version 0.3.0, now on the develop
branch, includes both these features, and moves the library away from Memento TimeMaps to the CDX search.
--uniques-only
flag.--collapse
Along the lines of what you were hoping?
@jsvine awesome!
The readme should probably be updated to reflect these additions. Also, it may be good to provide some examples of the collapse parameter in the documentation (either as feedback from the script, or in the wiki or so).
Cheers,
Erik
Great! The changes haven't been merged into master
yet; the new README can be found on the develop
branch. Reflects the new additions, and links to documentation for the collapse
parameters.
Now in master
and pushed to PyPi.
Feature suggestions:
Keep up the good work.