ipfs / in-web-browsers

Tracking the endeavor towards getting web browsers to natively support IPFS and content-addressing
https://docs.ipfs.tech/how-to/address-ipfs-on-web/
MIT License
349 stars 29 forks source link

Signed/Bundled HTTP Exchanges and WebPackage #121

Open lidel opened 6 years ago

lidel commented 6 years ago

This issue tracks ideas, use cases and work related to Web Packaging, especially Signed HTTP Exchanges (SXGs) and Bundled HTTP Exchanges which open the door to associating an origin with content that was not explicitly retrieved from that origin by the browser. Previous workarounds for the "origin problem" can be found in https://github.com/ipfs/in-web-browsers/issues/89 and https://github.com/ipfs/in-web-browsers/issues/66.

Background

Google is championing work on "Web Packaging" to solve MITM (aka "misattribution problem") of the AMP Project. Signed HTTP Exchanges (SXG) decouple the origin of the content from who distributes it. Content can be published on the web, without relying on a specific server, connection, or hosting service, which is highly relevant for IPFS, as it is great at distributing immutable bundles of data.

2018: Signed HTTP Exchanges

A longer overview can be found at developers.google.com: Signed HTTP Exchanges:

2018-11-08--00-02-09

The Google Chrome team is working towards making this an IETF spec and have a prototype built for Chrome with an origin trial starting with Chrome 71.

People would like to use content offline and in other situations where there isn’t a direct connection to the server where the content originates. However, it’s difficult to distribute and verify the authenticity of applications and content without a connection to the network. [..]

Previous attempts at packaging web resources [..] were motivated by speeding up the download of resources from a single server [..] This attempt is instead motivated by avoiding a connection to the origin server at all. #

It is worth noting that this is still a very PoC spec and current version of SXGs is considered harmful by Mozilla and the spec needs further work.

2019 Q3: Bundled HTTP Exchanges, AKA Web Bundles

Web Bundles, more formally known as Bundled HTTP Exchanges, are part of the Web Packaging proposal.

To be precise, a Web Bundle is a CBOR file with a .wbn extension (by convention) which packages HTTP resources into a binary format, and is served with the application/webbundle MIME type.

More at https://web.dev/web-bundles/

Potential IPFS Use Cases

How does this fit in with P2P distribution? Is the future of web publishing signed+versioned bundles over IPFS?

IPFS as transport for SXG / Web Bundles

Archival Use Cases (Web Bundles)

Learning Materials

WebPackage 101

  1. Fixing AMP URLs with Web Packaging (20min primer on Web Packaging)

    After this talk, you will have a solid grasp on the proposed solution to AMP's URL misattribution problem, and how Cloudflare is positioned to take the necessary steps to provide this fix to existing AMP publishers with minimal setup, and no code required.

  2. Web Packaging Format Explainer

    This document describes use cases for packaging websites and explains how to use the cluster of specifications in this repository to accomplish those use cases. It serves similar role as typical "Introduction" or "Using" and other non-normative sections of specs.

  3. Use Cases and Requirements for Web Packages

    Longer, more comprehensible read. This document lists use cases for signing and/or bundling collections of web pages, and extracts a set of requirements from them.

Known Problems and Concerns

References

Web Packaging Primer

Additional Resources

cc @jimpick @mikeal

jimpick commented 6 years ago

This is pretty crude, but these are the files I used in my demo:

https://github.com/jimpick/signed-exchange-test

Probably the only thing really re-usable in that is the service worker (sw-ipfs.js) which intercepts requests and loads the associated .sxg file from the ipfs.io gateway.

mikeal commented 6 years ago

I doubt I'll have time to dive into this before I go on vacation but this is awesome!

jimpick commented 6 years ago

Instructions on how to get into the origin trial here:

https://twitter.com/kinu/status/1055825077281939456

jimpick commented 6 years ago

Another link!

https://developers.google.com/web/updates/2018/11/signed-exchanges

lidel commented 6 years ago

Content servers: If you want to host SXGs created by publishers on their behalf, you can participate in the origin trial to have the SXGs processed by Chrome without requiring your users turn on a flag. – /signed-exchanges#participate_in_the_origin_trial

IIUC we could look into enabling this on our HTTP Gateway, that way we could demo .sxg with regular Chrome without passing any special flags.

lidel commented 5 years ago

A new talk just landed: :movie_camera: From Low Friction to Zero Friction with Web Packaging and Portals (Chrome Dev Summit 2018)

The focus is on UX, but I gathered highlights related to Web Packaging:

ps. Our Origin Trial setup is tracked in https://github.com/ipfs/infrastructure/issues/453

lidel commented 5 years ago

PSA:
Origin Trial for Signed HTTP Exchanges is enabled for ipfs.io Gateway

This means anyone can publish SXG on IPFS and it loads in regular Chrome 71 without any additional setup on user side.

Quick demo:

  1. Install Google Chrome 71
  2. Open SXG from our gateway, for example: https://ipfs.io/ipfs/QmVnnXjwXyEKhnrC1L7wegepUum2zN4JZUgtvA7DYtj4rG/sxg-location.sxg
  3. You will see location bar being replaced with Origin read from SXG! For the sample above it will be a localhost URL:

    2018-11-29--19-28-37

To create .sxg with Origin of your own domain follow steps from #creating_your_sxg.

This is just a brief update, expect a post at blog.ipfs.io with more details soon.

jimpick commented 5 years ago

This is so exciting! I'm going to experiment a bit with this later...

jimpick commented 5 years ago

I had a good meeting with the Google Chrome HTTP Signed Exchanges team in Tokyo today. I prepared a little demo:

https://ipfs.v6z.me/

It only works with Chrome Canary (Chrome Beta doesn't seem to work).

The top-level "bootstrap" website with the original index.html and service worker is published to IPFS (using IPNS). That's given it's own SSL certificate using https://cloudflare-ipfs.com/

Then the web content is processed with gen-signedexchange to generate a bunch of .sxg "HTTP Signed Exchange" files, which are published to IPFS (not using IPNS), and finally a 'ipfs-hash.txt' file with the hash of the content is written to the bootstrap site. The service worker looks at that file, and for any file that is being fetched, it will generate a redirect to the published content .sxg files hosted on the public ipfs.io gateway that Protocol Labs runs (which has the correct HTTP headers for the origin trial).

It's a little hard to explain to somebody unfamiliar with all the parts involved. Now that the demo is actually working, I'd love to do a proper blog post for it!

Source code for the demo: https://github.com/jimpick/signed-exchange-test/tree/ipfs.v6z.me-origin-trial

(sorry, no documentation yet ... I only get it working yesterday)

lidel commented 5 years ago

Spec change opening ability to load SXG from locally running IPFS node: https://github.com/WICG/webpackage/pull/352

kyledrake commented 5 years ago

Heads up that there's an AMP conf coming up in Tokyo that will likely have relevant discussion https://www.ampproject.org/amp-conf/

lidel commented 5 years ago

PSA: Origin Trial ends on Mar 6, 2019 – I will extended our token till the trial end.

lidel commented 5 years ago

Discussion about Service Worker and subresource SXG prefetching integration:

lidel commented 5 years ago

Cloudflare announced seamless generation of SXG for existing websites as "AMP Real URL". The feature will be available for free.

This older blogpost contains details on how signed content can be announced to the crawler.

jimpick commented 5 years ago

I tried copying an .sxg file found "in the wild" to IPFS and loading it through the gateway:

https://ipfs.io/ipfs/QmcMMFKpj4WtnfDinDh6vuTU5ViQD5ncVtRvTzXWYEyo5w/test1.sxg

In Chrome DevTools, the following error was displayed:

Screenshot 2019-04-29 22 53 05

Looks like we might need to tweak the header on the gateway.

jimpick commented 5 years ago

Good news, the gateway looks like it's updated and .sxg files are loading. :-)

https://ipfs.io/ipfs/QmWgYzCJuNupFeX1RLv27srqU1t7z6HJamUMeR9rm1zF2w

Edit: Actually, not yet ... I checked the headers, and they aren't updated yet. I think that's using a fallback. This stuff is confusing.

jimpick commented 5 years ago

Found a video of an IETF presentation on Web Packaging from this March https://youtu.be/woLbXaX0Gf4?t=700

Interesting that it is being presented to the IETF as a peer-to-peer technology!

Also, listen to the questions to hear @ekr from Mozilla express his strong "considered harmful" position in person.

jyasskin commented 5 years ago

@jimpick Yep, the original use case was around peer-to-peer content distribution in places where mobile data is very expensive or unreliable. We only later realized we could think of the AMP cache as a "peer".

The big unsolved problem for our peer-to-peer model is the way clients discover packages. Doing it naively in the client gives the cache a full view of the client's browsing history, which isn't acceptable. When the peer is on the internet (e.g. AMP), the source of the link to the resource (e.g. Google Search) can provide discovery without leaking any more information. Maybe IPNS can be a more general way to discover packages cached nearby, if its privacy properties are right?

jimpick commented 5 years ago

@jyasskin You are absolutely correct in saying that naively sharing peer-to-peer will expose users privacy. Full privacy is a tough problem to design for.

We're actively working on enhancements to IPNS and the DHT to improve performance. And there is a lot of ongoing work in libp2p for private networks and relaying. I think it would be neat if it would be possible to be able to restrict lookups so that content is only ever retrieved via privacy preserving mechanisms, and that content can be shared or re-shared without danger. There's often going to be a tradeoff in privacy vs. performance.

To make things worse, many politicians, intelligence agencies, police forces and even corporate IT departments are opposed to true anonymity, so it gets into really tricky legal territory.

For non-client applications, there are many datasets which are essentially public and for which most people would prefer performance since the privacy concerns aren't too much of a problem. That's one reason we're primarily focused on package managers and performance this year.

jimpick commented 5 years ago

Here's my take on the WebPackage controversy, which is fundamentally a rethink about SSL certificates and what they represent to the reader:

Right now, with AMP and HTTP Signed Exchanges, it is now possible for the "cached" exchanges to not transported directly from the Washington Post, but they are instead coming from Google or Cloudflare's CDN, which is not going to be spying on folks (they claim). Google is using their clout to provide what they call "privacy-preserving pre-fetch". Of course, if you are wearing a tinfoil hat, and you distrust Google, or the government, you might not think your privacy was preserved if Google's CDN is seeing all the documents being fetched.

The problem with peer-to-peer distribution and the experiments we (and others, such as @pfrazee at Beaker) are doing is that we are opening up the distribution to everybody, and the reader privacy problems get very tricky. So displaying the content with a lock saying it has come from "The Washington Post" might be true, but it's also quite possible that by retrieving the content via a peer-to-peer mechanism, there was a digital trail left, and reader privacy has been compromised ... so the "lock" displayed in the UI is misleading people to think that nobody can spy on them.

There has been much discussion about the reader privacy problem in the Dat community:

Clearly, there are ways to improve reader privacy on peer-to-peer networks. For example, access could be made using Tor. Or via an encrypted link to a place that the reader trusts. Content could be distributed via broadcast (eg. satellite) and multicast mechanisms, so there are no direct accesses. Not accessing things directly but via trusted intermediaries and privacy-preserving peer-to-peer networks could actually be a privacy improvement. Peer-to-peer distribution has clear advantages when it comes to censorship resistance.

I wonder if peer-to-peer web browsers for the distributed web need more than one UI element to display trust and privacy information?

Two cases:

Is this an area that could benefit from UX research?

lidel commented 5 years ago

UX issues around "HTTPS spoofing"

To expand on @jimpick's take, I believe contributing factor to the controversy is the UX of how SXG@v=b3 got implemented in Google Chrome. Looking from sidelines it may feel rushed and AMP-driven.

To be specific, Google Chome makes SXG indistinguishable from regular HTTPS, which breaks basic assumptions around how users understand the green padlock in location bar (aka "nobody but me and the Origin server can see the payload"). UX of regular HTTPS is reused as-is, pretending that end-to-end HTTPS transport was used with Origin from location bar, which is not true.

Browser should be the user agent, and as one it should never lie or break this type of trust.

To me it feels like UX problem. There should be a different presentation in location bar than re-using the green padlock from HTTPS. Browser should be honest that WebPackage was used and show who was involved in rendering the page: who is the Publisher, when package was created, who was involved in Distributing the content etc.

Need for Demonstrating Archival Use Cases

I believe archiving is a missed opportunity to make a case for WebPackage and figure out technical details and UX in browser without going into politics of PKI and HTTPS spoofing.

Would love to see more happening around this use case. Browsers could add support for saving a website to a WebPackage bundle and loading it from it while making it obvious to the user that they are looking at an archived snapshot, with all details at hand.

This would add real value to the web by empowering individuals and institutions (Internet Archive, Wikipedia) with tools to fight the link rot and censorship. Imagine all Wikipedia References as reproducible snapshots of articles that could be downloaded, shared and read offline.

Worth looking at is the potential overlap with W3C's Packaged Web Publications: https://github.com/w3c/pwpub

Gateway Update: ipfs.io supports v=b3

Good news: we've updated HTTP headers at our IPFS Gateway. Errors from https://github.com/ipfs/in-web-browsers/issues/121#issuecomment-487828624 should be gone, responses for ipfs.ip/ipfs/**.sxg now include:

Content-Type: application/signed-exchange;v=b3
X-Content-Type-Options: nosniff

Test in Chrome 74+: index.html.sxg :)

Gozala commented 5 years ago

To me it feels like UX problem

I think problem is far greater than UX, that is users are not in control - If all the pages visited through chrome are served through AMP regardless of icon in the location bar user privacy is compromised.

jimpick commented 5 years ago

I found some more videos (thanks YouTube) that go over WebPackaging and Signed HTTP Exchanges in quite a bit of detail.

BlinkOn 9 (April 2018): https://www.youtube.com/watch?v=rcJ9BLymVQE

BlinkOn 10 (April 2019): https://www.youtube.com/watch?v=iTYr5qVbHdo

lidel commented 5 years ago

Mozilla published 15 page paper which reaffirms their position: https://github.com/mozilla/standards-positions/issues/29#issuecomment-495122302

Concentrating on security issues is relatively easy. Coming to terms with a fundamental change to the security and content delivery model of the web is a more difficult task. This document tries to go further and explore other potentially problematic parts in the technology. [...] The increased exposure to security problems and the unknown effects of this on power dynamics is significant enough that we have to regard this as harmful until more information is available.

Quick takeaways:

ibnesayeed commented 5 years ago

I believe archiving is a missed opportunity to make a case for WebPackage and figure out technical details and UX in browser without going into politics of PKI and HTTPS spoofing.

We have an accepted position paper entitled, "Supporting Web Archiving via Web Packaging" (preprint version) in IAB's ESCAPE 2019 Workshop and hope to take this conversation there.

Last year in Web Archiving and Digital Libraries (WADL) 2018 Workshop we illustrated a mock up of UI/UX element that browsers can show in the address bar to acknowledge users about the state of a resource being archived (i.e., a memento). The icon can reveal a lot more context and metadata about the memento when clicked/tapped on (see slide #76 of the keynote talk).

Archive Icon

ibnesayeed commented 5 years ago

I am back from the ESCAPE Workshop last week. I learned a lot and conveyed the message/issues/needs related to web archiving which was taken very well. I can recap my slides on Supporting Web Archiving via Web Packaging in one of the upcoming IPFS Weekly Calls if there is any interest in it.

jimpick commented 5 years ago

@ibnesayeed I'm super interested! Would you like to present it next week? I still haven't lined up a speaker yet.

ibnesayeed commented 5 years ago

@ibnesayeed I'm super interested! Would you like to present it next week? I still haven't lined up a speaker yet.

@jimpick, next Monday (July 29) works for me for now.

jimpick commented 5 years ago

@ibnesayeed Great, I'll send you an email.

lidel commented 5 years ago

Updates in support for creating Bundled HTTP Exchanges from a list of URL:

lidel commented 5 years ago
lidel commented 4 years ago

Bundled HTTP Exchanges AKA Web Bundles

https://web.dev/web-bundles/:

To be precise, a Web Bundle is a CBOR file with a .wbn extension (by convention) which packages HTTP resources into a binary format, and is served with the application/webbundle MIME type.

The go/bundle CLI is currently the easiest way to bundle a website.

ibnesayeed commented 4 years ago

DSHR's take on Web Packaging for Web Archiving.

ibnesayeed commented 4 years ago

Content-Based Origins for the Web

lidel commented 4 years ago

Thank you for sharing! Interesting part: Content-Based Origin Definition:

For instance, the origin of content comprising the single ASCII character 'a' is represented as ni:///sha-256;ypeBEsobvcr6wjGzmiPcTaeG7_gUfE5yuYB3ha_uSLs.

If support for something like this lands in major browsers, it could be alternative way to introduce content-addressed origins, something like:

jyasskin commented 4 years ago

FYI, we're working on defining a way to name resources inside packages/bundles, at https://mailarchive.ietf.org/arch/msg/wpack/8fFVJv0AIksODEha8iyJrVDvOek/ (and other threads on that mailing list). I favor a new URL scheme, but it's possible it'll be something else. These names would be "before" any signing or other authenticity information is used.

Part of the most recent proposal is to simplify the URL scheme by embedding an assumption that the package is addressable using an https: or file: URL, which might exclude some use cases folks here care about. If that (or any other part of any of the proposals) is a problem, please speak up on the wpack@ietf.org list.

lidel commented 4 years ago

Thank you for the ping @jyasskin, replied in the thread, copy below.

tl;dr we believe URIs should be explicit, not implicit, and assuming https will be the answer to all use cases is not future-proof enough

Click to show copy of my reply > On 26/08/2020 20:00, Jeffrey Yasskin wrote: > > URLs for nested bundle resources > > > > ``` > > package://distributor.example/bundle.wbn$app/foo.wbn$bar.html > > package:///c:/Users/name/Downloads/bundle.wbn$app/foo.wbn$bar.html > > ``` > > > > This fetches the outer bundle by removing everything after the first `$` and: > > > > * If the URL has an authority, replacing the `package:` with `https:` > > * If the URL doesn't have an authority, replacing the `package:` with `file:`. > > > > To allow the outer bundle to be fetched using a third scheme, we would need to add a matching new scheme for addressing inside it. > > This proposal sneaks in a controversial change of making the internal protocol schemes to be implicit "https:" > It comes at a cost of protocol lock-in, which puts future use cases that rely on decentralized content routing and discovery in jeopardy. > > For example, it makes it impossible for a resource inside of a bundle to use ipfs:// URI (content-addressed protocol) for making network requests that: > (1) do not require bundle to point at a specific location of the data (heals link rot, improves resilience of a bundle) > (2) provide better integrity guarantees than HTTPS (content-addressing means received data can be validated in trust-less manner, based on the hash from address itself) > > In the interest of supporting distributed use cases, and keeping door open for future innovation in this space I suggest changing this proposal to avoid implicit defaults of https and always use explicit protocol scheme ensure arbitrary URI can be used by user agents in the future. > > Marcin
lidel commented 3 years ago

Related efforts at https://github.com/littledan/resource-bundles:

Q: How does this proposal relate to the Web Package/Web Packaging/Web Bundles/Bundled Exchange effort (repo)?

A: This is the same effort, really, with a particular scope. In particular, this repository has a focus on same-origin static subresource loading, while preserving the semantics and integrity of URLs. The Google Chrome team (including Jeffrey Yasskin and Yoav Weiss) have been collaborating closely on this project; there is no competition going on, but rather iteration and exposition.

lidel commented 2 years ago

Potentially related: Intent to Prototype: Isolated Web Apps mentions "Web Packaging" and "Web Bundles".

guest271314 commented 1 year ago

@lidel

Potentially related: Intent to Prototype: Isolated Web Apps mentions "Web Packaging" and "Web Bundles".

Shipped in Chromium Dev Channel, see https://github.com/GoogleChromeLabs/telnet-client.

guest271314 commented 10 months ago

@lidel FWIW This https://github.com/guest271314/webbundle is a repository that uses the minimal dependencies (I've found Rollup to use less dependencies than Webpack) to create a Signed Web Bundle for source for an Isolated Web App.

The authors wrote the code using node:crypto, which is a different animal from Web Cryptography API. I've been having a helluva time trying to figure out how to substititute Web Cryptography API for node:crypto, ultimately to create the SIgned Web Bundle and Isolated Web App in the browser.