w3c / strategy

team-strat, on GitHub, working in public. Current state: DRAFT
155 stars 45 forks source link

Signed HTTP Exchanges #171

Open sideshowbarker opened 5 years ago

sideshowbarker commented 5 years ago

allow people to bundle together the resources that make up a website, so they can be shared offline, either with or without a proof that they came from the original website
An HTTP exchange consists of an HTTP request and its response. A publisher (like https://theestablishment.co/) writes (or has an author write) some content and owns the domain where it's published. A client (like Firefox) downloads content and uses it. An intermediate (like Fastly, the AMP cache, or old HTTP proxies) downloads content from its author (or another intermediate) and forwards it to a client (or another intermediate). When an HTTP exchange is encoded into a resource, the resource can be fetched from a distributing URL that is different from the publishing URL of the encoded exchange.

Use cases: https://wicg.github.io/webpackage/draft-yasskin-webpackage-use-cases.html Explainer: https://github.com/WICG/webpackage/blob/master/explainer.md

As noted at https://www.chromestatus.com/feature/5745285984681984, representatives from both the Firefox dev team and Safari dev team has expressed unwillingness to implement:

The gist of the feedback is that the associated security considerations are serious to the degree that the Signed HTTP Exchanges feature is harmful:

Using signed HTTP exchanges to enhance the security of accessing a resource or verifying its authenticity seems like a good thing; but it seems positively harmful to use signed HTTP exchanges as a replacement for the longstanding web security model

See also https://github.com/w3c/strategy/issues/96

codedokode commented 5 years ago

I don't like that this spec allows changing URL bar contents. If we look at Google's documentation on AMP Viewer, there is a screenshot where the URL bar displays "www.amp.dev" while in fact the content is fetched from Google's servers. The user might think that they are connecting to amp.dev but in fact they are connected to Google and Google is collecting their data including IP address according to their policy.

I think the URL bar should show the real URL from which the content was loaded.

The author of the spec mentions this:

Two search engines have built systems to do this with today’s technology: Google’s AMP and Baidu’s MIP formats and caches allow them to prefetch search results while preserving privacy, at the cost of showing the wrong URLs for the results once the user has clicked. A good solution to this problem would show the right URLs but still avoid a request to the publishing origin until after the user clicks.

But I don't understand why they call the URL of the cache "wrong". It is the actual URL, not wrong. The user connects to Google's servers via a TLS connection signed by Google's key. The address bar should display google.com in this case.

sideshowbarker commented 5 years ago

Mozilla’s Position on Web Packaging https://docs.google.com/document/d/1ha00dSGKmjoEh2mRiG8FIA5sJ1KihTuZe-AXX1r8P-8/edit

From a technical standpoint, the changes are thorough and well-considered. There are some technical costs around security, operations, and complexity, but the specifications take steps to limit most of these costs. Many of the technical concerns are relatively minor. There are security problems, but most are well managed. There are operational concerns, but those can be overcome. …we don’t understand enough to say definitively that this is damaging to the system

In making an assessment about value, we have to see what benefits are realized, by whom. The main concern is web packaging might be employed to alter power dynamics between aggregators and publishers. …until more information is available on the effect on the web ecosystem, Mozilla concludes that it would not be good for the web to deploy web packaging

wseltzer commented 5 years ago

Discussion on IETF wpack list and at IETF 105 suggests a BOF at 106.

ylafon commented 4 years ago

See https://datatracker.ietf.org/doc/html/draft-thomson-escape-report

sideshowbarker commented 4 years ago

Content-Based Origins for the Web https://martinthomson.github.io/wpack-content/draft-thomson-wpack-content-origin.html

Content-based origins are proposed as an alternative to signed exchanges.

https://martinthomson.github.io/wpack-content/draft-thomson-wpack-content-origin.html#name-content-based-origin-defini

A content-based origin ascribes an identity to content based on the content itself. For instance, a web bundle [BUNDLE] is assigned a URI based on its content alone.

The sequence of bytes that comprises the content or bundled content is hashed using a hash function that is strongly resistant to collision and pre-image attack, such as SHA-256 [SHA-2]. The resulting hash is encoded using the Base 64 encoding with an URL and filename safe alphabet [BASE64].

This can be formed into the ASCII or Unicode serialization of an origin based on the Named Information URI scheme [NI]. This URI is formed from the string "ni:///", the identifier for the hash algorithm (see Section 9.4 of [NI]); a semi-colon (";"), and the base64url encoding of the hash function output. Though this uses the ni URL form, the authority and query strings are omitted from this serialization.

For instance, the origin of content comprising the single ASCII character 'a' is represented as ni:///sha-256;ypeBEsobvcr6wjGzmiPcTaeG7_gUfE5yuYB3ha_uSLs.

https://martinthomson.github.io/wpack-content/draft-thomson-wpack-content-origin.html#section-3.3

Signed exchanges … in effect, they add an object-based security model to the existing channel-based model used on the web. Signatures over bundles (or parts thereof) are used by an origin to attest to the contents of a bundle.

Having two security models operate in the same space potentially creates an exposure to the worst properties of each model.

In comparison, content-based origins do not require signatures. Questions of validity only apply at the point that a state transfer is attempted.

This avoids the complexity inherent to merging two different security models, but the process of state transfer could be quite complicated in practice… content-based origins aren't prevented from interacting with HTTP origins, which could lead to surprising outcomes if existing code is poorly unprepared for this possibility

https://martinthomson.github.io/wpack-content/draft-thomson-wpack-content-origin.html#name-communication-between-origi

Without knowledge of the content of a resource, or bundle of resources, a content-based origin will be impossible to guess. This means that communication is only possible if the frame in which the content is loaded by the origin attempting communication, or the content is known to that origin.

iherman commented 4 years ago

There is also this, which looks very close to the first option listed there:

https://tools.ietf.org/html/draft-sporny-hashlink-04

When using a hyperlink to fetch a resource from the Internet, it is often useful to know if the resource has changed since the data was published. Cryptographic hashes, such as SHA-256, are often used to determine if published data has changed in unexpected ways. Due to the nature of most hyperlinks, the cryptographic hash is often published separately from the link itself. This specification describes a data model and serialization formats for expressing cryptographically protected hyperlinks. The mechanisms described in the document enables a system to publish a hyperlink in a way that empowers a consuming application to determine if the resource associated with the hyperlink has changed in unexpected ways.

Cc: @msporny

msporny commented 4 years ago

https://tools.ietf.org/html/draft-sporny-hashlink-04

Yes, that could be a partial solution that's backwards compatible and doesn't break the existing Web model by tacking on ?hl=XYZ to the URL; it's a bit of a hack.

The other option, which is being picked up by IETF's HTTP WG is:

https://github.com/richanna/request-signing/blob/revise/draft-richanna-http-message-signatures-00.txt

... which is capable of digitally signing HTTP headers from the server, which again, doesn't break the Web's security model but gives you the option of doing a HEAD, getting the content hash of what you should be receiving as well as who signed the content hash (original domain) and then you have options on where you get the content from. This is also backwards compatible with the security model of the Web.

PS: Not taking a position on the Signed HTTP Exchanges discussion as I'm sure it would take me a week to catch up on the current status of that discussion. :)

sambacha commented 4 years ago

Mozilla’s Position on Web Packaging docs.google.com/document/d/1ha00dSGKmjoEh2mRiG8FIA5sJ1KihTuZe-AXX1r8P-8/edit

From a technical standpoint, the changes are thorough and well-considered. There are some technical costs around security, operations, and complexity, but the specifications take steps to limit most of these costs. Many of the technical concerns are relatively minor. There are security problems, but most are well managed. There are operational concerns, but those can be overcome. …we don’t understand enough to say definitively that this is damaging to the system In making an assessment about value, we have to see what benefits are realized, by whom. The main concern is web packaging might be employed to alter power dynamics between aggregators and publishers. …until more information is available on the effect on the web ecosystem, Mozilla concludes that it would not be good for the web to deploy web packaging

Here is an Assessment on value

Per industry standards, certificates that include the Signed HTTP Exchange extension have a 90-day maximum validity limit. source

Digicert charges $198 for an enabled certificate (this does not include an existing certificate you must have).

That's $792 extra a year. To use your own domain name, because of AMP.

Why even bother using domain names at all? Why not just let Google host everything?