oduwsdl / ipwb

InterPlanetary Wayback: A distributed and persistent archive replay system using IPFS
MIT License
615 stars 39 forks source link

Prep for ipwb demo at IPFS all-hands on July 17, 2017 #215

Closed machawk1 closed 7 years ago

machawk1 commented 7 years ago

Per https://twitter.com/daviddias/status/881969834409545730, @diasdavid invited us to demo ipwb at the IPFS all-hands. We ought to outline the best way to demo the software. For reference, below is the demo we originally planned for WAC2017 in London in June. For brevity during the presentation, we were unable to show all of the steps.

  1. Show live page: http://www.cs.odu.edu/~salam/
  2. Login to ws-dl-02, cd ~/public_html/ipwb
  3. ws-dl-02: ipwb index ipwb/samples/warcs/salam.warc > (cdxj)
  4. Show (cdxj) in terminal
  5. ws-dl-02: ipwb replay (cdxj)
  6. Show wac.cdxj on web browser
  7. Download to disk (desktop)
  8. Local: ipwb replay (cdxj)
  9. Show in browser locally
  10. ipfs cat (hash from cdxj)
  11. Go to https://ipfs.io/ipfs/(hash)
ibnesayeed commented 7 years ago

Another alternative approach would be to:

[CLAPS]

We can then talk about the proposed new model that would require exchanging the index file, in fact it won't require an index file at all. This would allow us to encourage people to prioritize the work on making historical records of IPNS changes, potentially, by using Ethereum blockchain.

Things to do before demo:

machawk1 commented 7 years ago

https://ibnesayeed.github.io/acrts/ does not currently appear all-green to me on the live Web. In fact, all blocks are currently red using Chrome 59. One initial idea that we abandoned due to other caveats and time constraints was to create a WARC on-the-fly that would be indicative of very current information that is likely not archived elsewhere (due to newness). One idea was the live conference Twitter feed. This would be more convincing than the acrts page plus I believe there were still some issues with ipwb+acrts when the acrts page was all-green a few weeks ago.

screen shot 2017-07-03 at 5 50 16 pm

I'll leave talking about the new model up to you. This ticket is about what we will demo. #70 still requires your attention, @ibnesayeed if we are to mention Docker.

ibnesayeed commented 7 years ago

https://ibnesayeed.github.io/acrts/ does not currently appear all-green to me on the live Web. In fact, all blocks are currently red using Chrome 59.

This is as intended and precisely what live resources should look for failing tests due to live-leakage. The WARC file however, contains all green resources and it was prepared by capturing the site when it was in all-green mode. We are supposed to show the historical (captured green) version and not the live version. That would be another very clear point in the demo to notice, which is not the case with my home page which doesn't change.

machawk1 commented 7 years ago

Live pages accessing live Web representations should not be considered "leakage".

If we can succinctly unpack the concept of ACRTS, it might be good to use it for a demo, but does not allow us to illustrate the fundamental ideas behind ipwb without the additional conceptual baggage.

We ought to have a stock set of demos, ideally recorded for posterity and when the tool inevitably breaks between releases. In this set, using acrts may be demo k in {1,...,k,...,n}. Let's figure out this set then build up to using acrts. If everything is working as expected by then (which will require testing on both of our parts), we can use acrts. What we don't necessarily want to highlight are the tool's shortcoming; mention and reference, yes, but not highlight.

ibnesayeed commented 7 years ago

Live pages accessing live Web representations should not be considered "leakage".

I think, you are missing the point here. The red state is the current state of the page. The page was once green when it was captured, hence, the pre-recorded WARC file and no live WARC creation. How did the historical page looked? That's yet to be revealed when we get our archival replay system working. How to get it working? Index the WARC file and push the contents to IPFS, run the replay server, and then see how the page once looked in the past. That's the demo. However, while replaying from the archive, if something red shows up, that's a failure of the replay and it is bringing the resource from the live web, which is evident from the live site that has all red.

ibnesayeed commented 7 years ago

What we don't necessarily want to highlight are the tool's shortcoming; mention and reference, yes, but not highlight.

Indeed, we dont want to demonstrate failures here, hence, ACRTS should only be used if we could get it all green. It was a target demo proposal only. Otherwise we can fallback to a simpler page such as my home page.

machawk1 commented 7 years ago

The page was once green when it was captured

So it should only be green when being replayed? It's red on the live Web, so for it to be green on replay does not seem like an accurate portrayal of the page in the archive.

I see the sample WARC in that repo was created with wget 1.15. Would that suffice for the capture tool? I can imagine that the "all-green" state is partially dependent on all of the representations being captured.

I still have reservations about this page vs. capturing a page newly created, as the resources in acrts, once pushed to IPFS, will remain there is pinned elsewhere.

On a related note, we ought to be creating tests for ipwb that exercise acrts to programmatically know where ipwb is failing at replay.

ibnesayeed commented 7 years ago

So it should only be green when being replayed? It's red on the live Web, so for it to be green on replay does not seem like an accurate portrayal of the page in the archive.

Unless you want to demonstrate "capture" (which is not a feature of IPWB yet), why is it not an accurate portrayal of the page? You are not capturing the live site. Live site is what it is in this very moment. You have a past capture of the page. You will know how it looked when it was captured after you will replay it.

I see the sample WARC in that repo was created with wget 1.15. Would that suffice for the capture tool? I can imagine that the "all-green" state is partially dependent on all of the representations being captured.

Don't worry, it is a one hundred percent complete capture. It has every single resource needed to replay that composite memento in any browser and screen resolution. It was captured using a seed list, so even WARCreat can't beat it in completeness.

I still have reservations about this page vs. capturing a page newly created, as the resources in acrts, once pushed to IPFS, will remain there is pinned elsewhere.

Which is not an issue. Resources will be only discovered using digest. If the content matching that digest is found elsewhere, that is absolutely fine and further highlights the potential of IPFS.

daviddias commented 7 years ago

Woooot! Excited to wake up and see this long thread of preparations for this demo. I really appreciate all the detail to make sure it is awesome ❤️

Let me know how can I be helpful. It might be the case that IPWB really deserves more than one demo, after all there is a lot of work here and it will take time for people to digest everything :)

daviddias commented 7 years ago

How do we feel about this demo? All set for next Monday?

ibnesayeed commented 7 years ago

@diasdavid where can we get the link to connect to the conference call? Also, what time it starts?

daviddias commented 7 years ago

Starts at 4pm UTC. More info here: https://github.com/ipfs/pm/issues/470

machawk1 commented 7 years ago

@diasdavid Would it be possible for us to hold off on the ipwb presentation for this week? I encountered a pretty major bug in #225 and have been rushing to fix it but have not been able to put in the cycles to adequately show what I want in a presentation. Also, I would like to be more logistically prepared on our end prior to demoing.

If demos happen weekly at the IPFS All-Hands, we can shoot for July 17th.

daviddias commented 7 years ago

@machawk1 understood, no problem at all, we can postpone for July 17th :) Thanks for letting me know!

machawk1 commented 7 years ago

@diasdavid Thanks for understanding! I think the issue(s) and prep on our end can be resolved within this coming week and the extra week will give us a few more cycles to polish/test it before demoing.

@ibnesayeed let's discuss this in-person or on a more suitable medium for outlining than GH comments.

machawk1 commented 7 years ago

The major issue (#225) was fixed at 16:00-ish on the previously planned demo day (July 10). The extra cycles devoted to working on it in lieu of preparing for an problematic presentation were effective. Let's plan for the July 17 demo.

machawk1 commented 7 years ago

Thanks for giving us the opportunity, @flyingzumwalt, @diasdavid, et al.!

daviddias commented 7 years ago

Thanks for showing up to give the demo today @machawk1 and @ibnesayeed 🙌🏽

Let's get that IPFS note to discuss next steps!

machawk1 commented 7 years ago

@diasdavid Done!

https://github.com/ipfs/notes/issues/251