oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
MIT License
607 stars 39 forks source link

[wish] option to force 404 on any resources not in cdxj #340

Open akavel opened 6 years ago

akavel commented 6 years ago

Would it be possible to have an option in ipwb (via serviceWorker?) which would force any URLs not found in cdxj to fail/404? So that I could be sure that what I see is built without any external online resources, so I can check if it's readable as such; in other words: I could assess if it's "backed-up well enough for me".

As a workaround, I imagine I could possibly setup some proxy to force no connection to the Internet? or just turn off my WiFi?

(Somewhat related to #335)

ibnesayeed commented 6 years ago

The intent of the Reconstructive ServiceWorker is to prevent from any live leaks. This means that IPWB (that now uses Reconstructive) will make sure that non-local resources are served from the archive only and not from the live web. Currently, there are some implementation issues, but those are known bugs that will eventually be fixed.

machawk1 commented 6 years ago

@akavel I believe this is what already occurs. For example, if I

ipwb index ipwb/samples/warcs/5mementos.warc | ipwb replay

then curl -i http://localhost:5000/20140114100000/memento.us/, I get an HTTP 200 response, as that's one of the mementos available. However, if I change that command to curl -i http://localhost:5000/20140114100009/memento.us/ I get the following response:

HTTP/1.0 404 NOT FOUND
Content-Type: text/html; charset=utf-8
Content-Length: 611
Link: <memento.us/>; rel="original", <http://localhost:5000/timemap/link/memento.us/>; rel="timemap"; type="application/link-format", <http://localhost:5000/timemap/cdxj/memento.us/>; rel="timemap"; type="application/cdxj+ors", <http://localhost:5000/20130202100000/memento.us/>; rel="first memento"; datetime="Sat, 02 Feb 2013 10:00:00 GMT",    <http://localhost:5000/20161231110001/memento.us/>; rel="last memento"; datetime="Sat, 31 Dec 2016 11:00:01 GMT"
Server: InterPlanetary Wayback Replay/0.2017.12.12.1843
Date: Tue, 19 Dec 2017 23:06:18 GMT

<h1>ERROR 404</h1>No capture found for memento.us/ at 20140114100009.<p>5 capture(s) available:</p><ul><li><a href="/20130202100000/memento.us/">memento.us/ at 20130202100000</a></li><li><a href="/20140114100000/memento.us/">memento.us/ at 20140114100000</a></li><li><a href="/20140115101500/memento.us/">memento.us/ at 20140115101500</a></li><li><a href="/20161231110000/memento.us/">memento.us/ at 20161231110000</a></li><li><a href="/20161231110001/memento.us/">memento.us/ at 20161231110001</a></li></ul><p>TimeMaps: <a href="/timemap/link/memento.us/">Link</a> <a href="/timemap/cdxj/memento.us/">CDXJ</a>

or http://localhost:5000/20140114100009/memento.us/ from a browser shows

screen shot 2017-12-19 at 6 06 58 pm

Can you explain your idea a bit more?