patcg-individual-drafts / topics

The Topics API
https://patcg-individual-drafts.github.io/topics/
Other
605 stars 199 forks source link

Sec-Browsing-Topics header format seems a bit awkward #183

Closed domenic closed 1 year ago

domenic commented 1 year ago

In particular, emitting the version, but only sometimes, seems error-prone for consumers. It means consumers need to do extra work to reconstruct the correct version + topic pairs, on the server side, using a sort of stateful parser loop. They can't just use out-of-the-box HTTP structured field parsers.

Another example failure mode would be if something in their sever or proxy stack sorts HTTP header lists, this could be disastrous. E.g. it could turn Sec-Browsing-Topics: 5;v="x", 3, 1;v="y" (which is equivalent to {(1, "y"), (3, "x"), (5, "x")}) into Sec-Browsing-Topics: 1;v="y", 3, 5;v="x" (which is equivalent to {(1, "y"), (3, "y"), (5, "x")}).

jkarlin commented 1 year ago

The trade-off here is bytes on the wire (~15 bytes per version) vs ease-of-use. Combined with the desire to pad the header to ensure that we're not leaking the number of topics sent, with your proposal we'd always need to send 3 versions (45 bytes for versions per request). The folks that we expect to use the API is pretty small in number (ad-tech) and generally pretty experienced. I agree that parsing it requires slightly more logic than simply reading a list, but it's pretty rudimentary.

Another example failure mode would be if something in their sever or proxy stack sorts HTTP header lists, this could be disastrous.

If a proxy were to reorder the list that seems bad on many levels? Sometimes ordering matters.

In the end I don't feel strongly here and could be persuaded either way, but we'll wind up having large padding buffers on many requests as a result.

domenic commented 1 year ago

When you say "many requests", you just mean the requests which have opted in to getting this data, right? As you say, this is a pretty small number of users. Given that those are generally for multi-kilobyte documents, it seems to me like penny-pinching to worry about 45 bytes.

jkarlin commented 1 year ago

The number of users is small, but their reach is enormous. Let's assume at least a few requests per page on a significant fraction of page loads, ~50%?

domenic commented 1 year ago

It still seems not worth worrying about, compared to the transfer sizes of ads themselves.

domenic commented 1 year ago

In an offline chat, @jyasskin suggested using inner lists to group:

Sec-Browsing-Topics: (5 3);v="x", (1);v="y"

This is nice! However we then have to extend it with the padding idea from https://github.com/patcg-individual-drafts/topics/pull/186, which makes it more awkward. The nice thing about #186 is that consumers know they can just throw away the entire map entry with key p.

I think the following is a reasonable way of getting both benefits:

Sec-Browsing-Topics: (5 3);v="x", (1);v="y", ();v="PPPPPPPPPPPP"

This seems like it would be pretty easy to parse because it's effectively saying "there are no topics with version = PPPPPPPPPPPP." Full analysis below.

Detailed comparison of parsing ease Assume our software's goal is to create a list of `{ topic, version }` tuples. ### Simplest format with padding ``` Sec-Browsing-Topics: t=(5;v="x" 3;v="x" 1;v="y"), p="PPP" ``` ```js const parsed = genericStructuredHeaderParser(headerValue); let result = []; for (const item of parsed["t"]) { result.push(item.value, item.params["v"]); } ``` ### Abbreviated format with padding (removes repeated versions) ``` Sec-Browsing-Topics: t=(5;v="x" 3 1;v="y"), p="PPP" ``` ```js const parsed = genericStructuredHeaderParser(headerValue); let result = []; let lastVersion; for (const item of parsed["t"]) { if (item.params["v"]) { result.push(item.value, item.params["v"]); lastVersion = item.params["v"]; } else { result.push(item.value, lastVersion); } } ``` ### Inner list with groups with padding ``` Sec-Browsing-Topics: (5 3);v="x", (1);v="y", ();v="PPPPPPPPPPPP" ``` ```js const parsed = genericStructuredHeaderParser(headerValue); let result = []; for (const innerList of parsed) { const version = innerList.params["v"]; for (const topic of innerList) { result.push(topic.value, version); } } ``` (The inner loop is a no-op for the zero-length padding inner list.)
jkarlin commented 1 year ago

I also like Jeffrey's suggestion. How about a compromise that keeps the dictionary which makes it a bit more expicit that it's for padding and uses the same number of characters when whitespace is removed? Having a dictionary also allows us to add extra keys in the future if necessary which is a nice-to-have.

Sec-Browsing-Topics: t=((5 3);v="x", (1);v="y"),p=PPPPPPPPPPPP

const parsed = genericStructuredHeaderParser(headerValue);
let result = [];

for (const innerList of parsed["t"]) {
  for (const topic of innerList) {
    result.push(topic.value, innerList.params["v"]);
  }
}

Correction: Not exactly the same number of chars, but quite close.

domenic commented 1 year ago

Inner lists can't have inner list members :(

jkarlin commented 1 year ago

Ah, right. That's annoying.

Okay, I'd prefer to rename the pad param to 'p' in your example, like so:

Sec-Browsing-Topics: (5 3);v="x", (1);v="y", ();p="P00000000000"

And then the code would be:

const parsed = genericStructuredHeaderParser(headerValue);
let result = [];
for (const innerList of parsed) {
  for (const topic of innerList) {
    result.push(topic.value, innerList.params["v"]);
  }
}

There is no longer a guaranteed v param on each topic list. Yes, it's true that code might then explode if they try to access v assuming it's always there, but that's preferable imho to accidentally using the padding as a version.

jkarlin commented 1 year ago

Yao points out that structured headers do support '.' and ':' in tokens.

dmarti commented 1 year ago

Why does this header have the Sec- prefix? (I don't see anything explaining that under https://github.com/patcg-individual-drafts/topics#privacy-and-security-considerations )

jkarlin commented 1 year ago

The sec- prefix protects the header from being modified by script.

dmarti commented 1 year ago

@jkarlin See https://github.com/patcg-individual-drafts/topics/issues/25 and discussion at https://github.com/patcg-individual-drafts/topics/blob/55153e6166c4aa574ba20226a83bf7e693c07ed6/meetings/2023-06-05-minutes.md -- trying to figure out how a user is going to be able to use a browser extension to filter Topics API data (as they can already do now with cookie management extensions)

jkarlin commented 1 year ago

If a user doesn't want/like topics, then they should simply disable it, which extensions can do. I don't think it makes sense to provide extension support simply to modify topics.

martinthomson commented 1 year ago

The sec- prefix protects the header from being modified by script.

Yes, but that is a factual statement. You didn't address the underlying question, which was whether there was a need for the value to be protected from modification. Is there some case where a page gains access to a topic value and makes requests - in a context where adversarial script might also be present - and wants to ensure that the adversarial script can't change the topics?

Or is this just the case that all new header fields gain a sec- prefix because we don't want to think about the consequences if we mistakenly omit sec- and someone discovers an attack?

michaelkleber commented 1 year ago

Just as Martin says, we've heard several times during the development of the various Privacy Sandbox APIs that potential ad tech users are concerned about the risk that a script on the page might try to maliciously modify headers for some fraud purpose. A Sec- header seems like a prudent safeguard in that context.

amicus-veritatis commented 1 year ago

A Sec- header seems like a prudent safeguard in that context.

Maybe, But I highly doubt that sec- is an appropriate header for browsing-topics. As I understand, The primary purpose of the sec- prefix is to prevent malicious websites from convincing user agents to send forged metadata along with requests. It is a security-related prefix indeed.

If it has to be a forbidden header, I propose a new, industry-specific prefix advertising- or analytics-.

domenic commented 1 year ago

As the person who originally opened this issue, I consider it closed by https://github.com/patcg-individual-drafts/topics/commit/9d8fec1e3205937b3d5654dc0dd07ee4306c111f. It seems like a new discussion has started about the Sec- prefix, which does not seem relevant to the OP. Probably that discussion is best moved to another issue?

dmarti commented 1 year ago

@domenic Done, thank you: https://github.com/patcg-individual-drafts/topics/issues/207