Open machawk1 opened 10 years ago
Here's a start: https://github.com/internetarchive/heritrix3/archive/master.zip https://github.com/iipc/openwayback/archive/master.zip https://github.com/ariya/phantomjs/archive/master.zip https://github.com/alard/warc-proxy/archive/master.zip https://github.com/apache/tomcat/archive/trunk.zip
Q: Do the HTTP links provided by GitHub always give the latest from the repo? Also, we should probably record a hash to verify when each package should be updated as well as be able to extract old configs and re-run the WAIL config injection process (Oy vey!).
e.g., For Heritrix fetch https://github.com/internetarchive/heritrix3/archive/master.zip using a standard GET request or, if you want to be fancy, utilize the dulwich module ( https://github.com/jelmer/dulwich ) to do a pure python-based git request for the latest. Baby steps require first defining what all of the appropriate http(s) URIs would be for each package. Following that, code is needed to replace the config files (platform-specific) with the appropriate directories (e.g., the Heritrix jobs directory, files1 for Wayback, etc.).