w3c / resource-timing

Resource Timing
https://w3c.github.io/resource-timing/
Other
122 stars 36 forks source link

Provide a TAO-bypass for Navigation timing #220

Open yoavweiss opened 4 years ago

yoavweiss commented 4 years ago

Navigation Timing is sharing many attribute definitions with Resource Timing, but doesn't use TAO, as it makes no sense in NT's context. We need to provide an internal flag that enables NT to reuse those definitions without requiring TAO. See https://github.com/w3c/navigation-timing/issues/117

caribouW3 commented 4 years ago

Could you be more specific on what you mean by "it makes no sense in NT's context", since TAO plays a role for workerStart, w3c/navigation-timing#118 The suggested flag would apply only to some definitions, right?

yoavweiss commented 4 years ago

The reason "it makes no sense" is that TAO checks if the destination origin is OK with exposing timing, where for NT, the destination origin is also the one with access to the data. workerStart in the context of NT is always the origin's own SW (since it should be the last one), so it doesn't make sense of it to opt-in to itself.

Does that make sense?

caribouW3 commented 4 years ago

Indeed, I had missed the clarification in issue 100. So that would add workerStart to the list in 117 I suppose.

noamr commented 3 years ago

Doesn't https://github.com/whatwg/html/pull/7105 fix this?

annevk commented 3 years ago

It does seem like in a redirect chain of A to B to C, A and B could opt-in to sharing their timing data. So C would get to know how these redirects impacted it. That's a slightly different model from TAO though so perhaps it needs its own design and header.

noamr commented 2 years ago

So, the TAO bypass is important first for being able to expose redirectStart and redirectEnd, but it might also be a way forward with navigation-timing#160.

Suggesting one of three options:

RTAO

a new header that's similar to TAO, which exposes this information and also in the future is an opt-in to get the navigation start time when cross-origin redirects are involved. This is perhaps the simplest, but less extensible

Timing-Allow-Origin: *
Redirect-Timing-Allow-Origin: https://some-origin.com

TAS

A new header specifying the scope of the timing allowed, with a list of values resource | redirect | ..., in conjunction with Timing-Allow-Origin, like so:

Timing-Allow-Origin: https://example.com
Timing-Allow-Scope: resource, redirect

This is similar to CORS, kind of like Access-Control-Allow-Methods

TAP

A parameterized header, in place of TAO, similar to Content-Security-Policy, like:

Timing-Allow-Policy: same-site redirect, https://other-trusted-site.com resource, * none
noamr commented 2 years ago

OK, so a more detailed proposal based on the above and the WG discussion, to get this kicked off:

Resource-Timing-Policy: 1*<policy-directive>
<policy-directive>: <directive> <source>;
<directive>: resource | connection | exchange | redirect
<source>: <url-with-wildcards> / 'same-origin' / 'same-site' / 'none' / 'all' / 'nonce-{nonce}'

For example:

Resource-Timing-Policy: resource 'all'; connection 'same-site'; redirect '*.amazon.com'; exchange 'nonce-sha256-defasdgase123'

Each directive represents a group of attributes in resource timing:

redirect
  redirectStart
  redirectEnd
  // in the future: list of redirect URLs

connection
    nextHopProtocol;
    domainLookupStart;
    domainLookupEnd;
    connectStart;
    connectEnd;
    secureConnectionStart;

exchange
    fetchStart
    requestStart
    responseStart

resource
   encodedBodySize
   decodedBodySize
   transferSize

The following are always exposed, as they are either not TAO protected or anyway same-origin:

  initiatorType
  responseEnd
  workerStart // maybe this should move to navigation timing?
noamr commented 2 years ago

Speaking with @yoavweiss about this and reading through previous correspondence with security folks, I want to take a different approach.

Instead of making a complex opt-in mechanism for TAO because it got too big, let's remove things from TAO and make the problem smaller.

This is possible due to the new work on attribution reporting, which would allow us to report some of the more sensitive metrics in an aggregate, which would mitigate the exposing of user data and would still allow providers to understand their performance & regression in scale.

Context:

Thus, we can divide the different metrics to four policies:

Metric No TAO TAO Same-Origin Attribution Reporting CORS/CORP
redirectStart ✔️ ✔️ when manual
redirectEnd ✔️ ✔️ ✔️ when manual
fetchStart ✔️ ✔️ ✔️
domainLookupStart ✔️
domainLookupEnd ✔️
nextHopProtocol. ✔️
connectStart ✔️ ✔️
secureConnectionStart ✔️ ✔️
connectEnd ✔️ ✔️
requestStart ✔️ ✔️ ✔️
responseStart ✔️ ✔️ ✔️ ✔️
responseEnd ✔️ ✔️ ✔️ ✔️ ✔️
encodedBodySize ✔️ ✔️ ✔️
decodedBodySize ✔️ ✔️ ✔️
transferSize ✔️ ✔️ ✔️

The nice thing about this is that it lives TAO as something very limited, and we don't have to find a new complex opt-in. When new metrics are proposed, we can see where they fit in the table, and deal with the issue of a new opt-in when and if it arises.

yoavweiss commented 2 years ago

Thanks @noamr for the great summary! I agree this is where we want to end up in the long term.

Random comments:

Otherwise, I think we can split our path to get to that future to multiple chunks: 1) Expose the size attributes to CORS/CORP enabled content and estimate the loss of data of removing these attributes from TAO. Proceed with the latter carefully, perhaps while working with various 3P providers that currently TAO their content to also CORP-enable it. 2) Prototype aggregated perf metrics reporting with some of the primitives exposed as part of attribution reporting. Ideally, we won't need new primitives for it, but we might.. 3) Once we have a concept of what aggregated reporting may look like, and it's something that's deployed for attribution reporting reasons, we can start exposing more info in those contexts and have RUM providers experiment with that. 4) Assuming experimentations are successful, we'd be able to successfully remove necessary attributes from TAO.

One risk with (3) would be implementation in some browsers but not others. I'm hoping that starting this conversation early can help us avoid that risk.

I'd love opinions of RUM providers on this path (@nicjansma, @cliffcrocker), as well as other vendors (@bdekoz, @sefeng211, @achristensen07), and web-app-sec folks (@mikewest, @camillelamy).

camillelamy commented 2 years ago

Thanks! I like the direction this proposal is going. In particular, removing domainLookupStart and domainLookupEnd from TAO is a good thing, as they relate to the state to the user's network, and it is not something that an origin should have access to.

noamr commented 2 years ago

Thanks @noamr for the great summary! I agree this is where we want to end up in the long term.

Random comments:

  • At least some of the TAO-only attributes in your table should IMO also be enabled for CORS, as they are visible to fetch().

I think this is only true for responseStart. requestStart can expose connection times, it's not currently exposable via fetch AFAIK.

  • You left out nextHopProtocol, which is contentious, and may need to be in the same bucket as DNS timing.

Fixed

yoavweiss commented 2 years ago

requestStart can expose connection times, it's not currently exposable via fetch AFAIK.

Agree on requestStart, although it doesn't necessarily equals the connectEnd time, as there could be delays between the two. OTOH, I believe redirectEnd and fetchStart are exposed to fetch (with "manual" redirect mode), so should be available to CORS responses.

noamr commented 2 years ago

requestStart can expose connection times, it's not currently exposable via fetch AFAIK.

Agree on requestStart, although it doesn't necessarily equals the connectEnd time, as there could be delays between the two. OTOH, I believe redirectEnd and fetchStart are exposed to fetch (with "manual" redirect mode), so should be available to CORS responses.

I fixed it, though maybe with manual redirect mode redirectStart and redirectEnd are less meaningful as they're equivalent to startTime & responseEnd.

annevk commented 2 years ago

It's not clear to me we want CORP to mean anything beyond "can be Spectre attacked". We had a long discussion about this elsewhere and I don't think we were able to come with a principled model around it.

noamr commented 2 years ago

It's not clear to me we want CORP to mean anything beyond "can be Spectre attacked". We had a long discussion about this elsewhere and I don't think we were able to come with a principled model around it.

I'm actually of the same opinion, but wanted to keep that in mind in case that discussion resolves. The discussion was in this Google Doc, took me a while to find it.

/cc @yoavweiss

annevk commented 2 years ago

It's also at #240 and I vaguely recall some other GitHub discussion, but I cannot find it easily.