Closed b5 closed 5 years ago
I'd like to break this PR into two parts: the first to land just a proof-of-concept workflow so others can play with it, and then a follow up PR to incorporate feedback. This second PR should be where a lot of bug closing happens. and we get out of proof-of-concept land and into usable code land. By doing it this way I'm hoping a brave soul or two will be willing to use this buggy code to help write some initial documentation.
Love this.
From review on a call:
walk server
/captures/resolved/now
behaves differently than /captures/resolved/now/
/captures/resolved/now/https://youtube.com
redirects to /captures/resolved/now/https:/youtube.com
(note one slash after the protocol); this should probably either work fine or redirect to a version with the protocol removed/captures/resolved/now/
above)X-Archive-Orig-
)Oooook I think this is ready for review. Lots to do, but from our latest round of feedback:
walk server
/captures/resolved/now
behaves differently than /captures/resolved/now/
/captures/resolved/now/https://youtube.com
redirects to /captures/resolved/now/https:/youtube.com
(note one slash after the protocol); this should probably either work fine or redirect to a version with the protocol removed/captures/resolved/now/
above) haven't managed to fix the 500
response code, but the rest is goodconfig.Worker.RecordResponseHeaders: true
X-Archive-Orig-
) (I vote we punt on this until we need it, It'll take maybe 2 hrs)I've also added a feature I think is important for this phase: the capacity to fetch seeds from a file or URL. a new string configuratino property: config.Coordinator.SeedsPath
lets you supply a string that's either a URL or a filepath (relative to pwd, or absolute) of a newline-delimited list of urls to seed. I've been using this to point at the raw text from @Mr0grog's gist: https://gist.github.com/Mr0grog/40cdcd56b048d7f00b0d47d3aca70be0/raw/c6ad8f6f55c93ab46b033d4486033b249c8b65db/webmonitoring_active_urls.txt
The biggest thing I'd like to tackle next:
but all in all I think we should merge this & start into documentation & testing for the web monitoring use-case. Would love to hear if others agree
Alright, after today's team call I'm re-fired up to take a run at saving this project. I'm going to merge this b/c we're already waaaaaay past master, and I can't write a proper "getting started" readme without doing some of the work we agreed needs doing first in today's call.
Opening this PR now so others can track progress & add comments.
One finished, this PR should be an initial implementation of #16, and allow the following flow:
walk start
)walk start
again, with a different target dir)walk server
)Once that's possible, we merge, party, and move on to filing lots of bugs.
I'd like to break this PR into two parts: the first to land just a proof-of-concept workflow so others can play with it, and then a follow up PR to incorporate feedback. This second PR should be where a lot of bug closing happens. and we get out of proof-of-concept land and into usable code land. By doing it this way I'm hoping a brave soul or two will be willing to use this buggy code to help write some initial documentation.
After that, only three things stand in our way of an initial staging server: