machawk1 / wail

:whale2: Web Archiving Integration Layer: One-Click User Instigated Preservation
https://matkelly.com/wail
MIT License
351 stars 35 forks source link

Add dynamic fetching of the included sub-programs #66

Open machawk1 opened 10 years ago

machawk1 commented 10 years ago

e.g., For Heritrix fetch https://github.com/internetarchive/heritrix3/archive/master.zip using a standard GET request or, if you want to be fancy, utilize the dulwich module ( https://github.com/jelmer/dulwich ) to do a pure python-based git request for the latest. Baby steps require first defining what all of the appropriate http(s) URIs would be for each package. Following that, code is needed to replace the config files (platform-specific) with the appropriate directories (e.g., the Heritrix jobs directory, files1 for Wayback, etc.).

machawk1 commented 10 years ago

Here's a start: https://github.com/internetarchive/heritrix3/archive/master.zip https://github.com/iipc/openwayback/archive/master.zip https://github.com/ariya/phantomjs/archive/master.zip https://github.com/alard/warc-proxy/archive/master.zip https://github.com/apache/tomcat/archive/trunk.zip

Q: Do the HTTP links provided by GitHub always give the latest from the repo? Also, we should probably record a hash to verify when each package should be updated as well as be able to extract old configs and re-run the WAIL config injection process (Oy vey!).