w3c / navigation-timing

Navigation Timing
https://w3c.github.io/navigation-timing/
Other
116 stars 30 forks source link

Navigation Timing behavior for HTTP Exchange (SxG) loading #107

Open igrigorik opened 5 years ago

igrigorik commented 5 years ago

SxG loading

For a brief intro on HTTP Exchanges see: SxG 101. As a tl;dr...

  1. User navigates to https://distributor.example.org/foo.sxg, which is a signed exchange that provides a Signature header field.
  2. User-agent validates the field value of the Signature header field by:
    1. Making a request to origin’s certificate reference, or fetching a cached copy hosted by the distributor (as a means to avoid leaking SxG loads), to validate provided content signature.
      • Note: certificate reference response is cached subject to regular HTTP cache semantics. There are no restrictions or HTTP cache caveats for the cert response. As such, if a cached cert reference response is available in HTTP cache, no external request will be made to validate the signature.
    2. If valid, user agent issues an "internal redirect" to signed URL with stashed exchange attached to the request.
    3. Otherwise, redirect to signed URL — https://github.com/WICG/webpackage/issues/397.

Brief aside: Chrome implementation & plumbing

In Chrome’s implementation, SxG validation issues an “internal redirect” if the validation passes, which is similar to how we handle HSTS: when an HTTP navigation is initiated, if the origin is matched against a known HSTS origin, an internal redirect is issued to redirect the navigation to HTTPS. From a user's and Navigation Timing perspective, this "internal redirect" is not observable, as the user agent effectively “rewrites” the URL of the request on the fly.

image

Note: "internal redirect" is observable via Chrome DevTools but is not exposed via Nav or Resource Timing APIs; "internal redirects" is a Chrome implementation detail.


SxG + Navigation Timing use cases

As a site owner that’s providing SxG resources to multiple distributors, I need to…

(A) Be able to gather RUM telemetry from SxG and origin-served responses

This works without any modifications to NT — hooray — no action needed.

(B) Be able to distinguish SxG loads from origin-served responses

The performance characteristics of SxG loads can be substantially different from origin loads (and not necessarily in a positive way) and we (publishers, content owners) need to be able to segment these populations. Further, given that there can be many different distributors serving the SxG, there needs to be a signal to distinguish between origin and each of the distributors serving our content. For example, if we detect that a particular distributor is delivering a degraded experience, we need to be able to identify it through telemetry to take action to address the issue: contact the distributor to address the issue, revoke the SxG from that distributor, etc.

(C) Be able to measure the validation cost and overhead on response

Signature validation can require external requests that can add significant overhead to validate the exchange, in addition to the CPU overhead to validate the signature. This cost needs to be made visible to allow site owners to evaluate costs vs. benefits of providing SxGs.

Related discussions:


Surfacing SxG in NavTiming

Applying the same approach for how we handle HSTS in Navigation Timing today, one plausible route to address the above use cases would be to provide...

Providing distributorName addresses (B), sxgValidation{Start,End} addresses (C).

Crazytalk?


Note that exposing the above would make information about SxG load be available to JavaScript, which may have security/privacy implications being explored in https://github.com/WICG/webpackage/issues/433. However, we may not necessarily need the full distributor URL to satisfy the above use cases: the most important bit is being able to differentiate origin vs distributor and differentiate between distributors which, I think, could satisfied by exposing a subset of the outer url — e.g. stripped of query parameters, etc.

Paging @jyasskin @sleevi @horo-t for input, feedback, and guidance :)

yoavweiss commented 5 years ago

The above makes sense to me.

sleevi commented 5 years ago

it seems like the use-case can be satisfied by only reporting the distributor’s eTLD+1, instead of the full URL

I’d be really uncomfortable with this; both in that I don’t think we should build any new web platform features using the PSL (origin or URL are the right primitives), but also because it doesn’t provide any meaningful privacy boundary, which I’m (perhaps incorrectly) assuming is the goal.

I think we probably want to sit on this issue until https://github.com/WICG/webpackage/issues/433 is sorted - that seems to get to the core about whether it can be exposed at all safely. From there, we then also need to figure out whether or not it can be exposed to JS at all - which would seem to significantly impact any discussion here.