General approach to capability negotiation

samuelweiler commented 3 years ago

I thank the editors for what appears to be an excellent fingerprinting analysis. This is exactly the sort of thing I'm looking for in specs.

As a general thing, why are we exposing device capabilities to the app for purposes of negotiation? Couldn't we instead have sites expose available media formats and have browsers (perhaps in a way not exposed the application) pick the one they like best? That way a browser wishing to be more privacy preserving could simply make a consistent choice, without having to fake an answer to this API, as recommended in https://w3c.github.io/media-capabilities/#decoding-encoding-fingerprinting.

chcunningham commented 3 years ago

I'm thrilled the fingerprinting analysis is good.

This section of the explainer lays out the design philosophy for the current API shape. It mentions your idea as well a potential follow up that could work in tandem this design (so far, not something we pursued). There are some cases having the browser pick works well. For instance, <video> may have multiple nested <source> tags and the UA does choose between those. But modern video APIs like MSE, EME, and WebRTC, have increasingly moved in the direction of letting sites choose what to stream and we tried to align our design with that direction.

Having the browser say which format it prefers is sometimes still compatible with the newer APIs. For instance, with MSE as used by sites like YouTube, this could work fine. But, with EME and WebRTC, its more complicated. For EME, a site like Netflix may balance the most performant stream configuration against the most secure stream configuration. When these are the alligned, the choice is easy. But they are not always aligned, and the site is in a better position to break a tie. With WebRTC, you may have a preferred format, but you have to participate in a negotiation with your peers to arrive at a format that everyone supports. An API that can tell you info about each possible format is better suited to populating the format negotiation ladder.

samuelweiler commented 3 years ago

@chcunningham, Thank you!

I'm thrilled the fingerprinting analysis is good.

This section of the explainer lays out the design philosophy for the current API shape. It mentions your idea as well a potential follow up that could work in tandem this design (so far, not something we pursued).

This isn't quite as detailed as I had hoped for. As you say, it mentions the possibility of the UA picking, but it says little about why that path isn't being chosen.

... But modern video APIs like MSE, EME, and WebRTC, have increasingly moved in the direction of letting sites choose what to stream and we tried to align our design with that direction.

If "we're following the example" is the argument, then I'd like to push back. I'm not convinced that these others got it right, and I'd like to take a fresh look here.

Having the browser say which format it prefers is sometimes still compatible with the newer APIs. For instance, with MSE as used by sites like YouTube, this could work fine. But, with EME and WebRTC, its more complicated. For EME, a site like Netflix may balance the most performant stream configuration against the most secure stream configuration. When these are the alligned, the choice is easy. But they are not always aligned, and the site is in a better position to break a tie.

The user might have an opinion in this case, also. e.g., the user might have low power availability and might prefer the lower power choice. As as you point out, there may be misalignment. Given that, I would argue that the UA - as the "user's agent" - is in the better place to break the tie, not the site.

With WebRTC, you may have a preferred format, but you have to participate in a negotiation with your peers to arrive at a format that everyone supports. An API that can tell you info about each possible format is better suited to populating the format negotiation ladder.

The more-than-two-party case (of WebRTC) seems different than the two-party case, and my understanding was that this API is for two-party cases, right? In that case, having the site provide information about what it supports - rather than the UA supply that information - would seem to provide sufficient completeness, right?

chcunningham commented 3 years ago

If "we're following the example" is the argument, then I'd like to push back. I'm not convinced that these others got it right, and I'd like to take a fresh look here.

The fresh look is welcome, but I think the proposed design is not feasible at this time. The MediaCapabilities API is widely implemented and used. We have an opportunity to make additions, improvements, refinements, etc... but we cannot make a breaking change of this magnitude.

The user might have an opinion in this case, also. e.g., the user might have low power availability and might prefer the lower power choice. As as you point out, there may be misalignment. Given that, I would argue that the UA - as the "user's agent" - is in the better place to break the tie, not the site.

Sites may offer users this choice while also factoring in their secret sauce for whatever they think makes the best user experience.

The more-than-two-party case (of WebRTC) seems different than the two-party case, and my understanding was that this API is for two-party cases, right? In that case, having the site provide information about what it supports - rather than the UA supply that information - would seem to provide sufficient completeness, right?

For WebRTC usage, this API is not limited to two parties. The API can describe the send and receive capabilities of the local machine. The app could then exchange this information with the N parties in a conference call setup as part of format negotiation.

samuelweiler commented 3 years ago

The fresh look is welcome, but I think the proposed design is not feasible at this time. The MediaCapabilities API is widely implemented and used. We have an opportunity to make additions, improvements, refinements, etc... but we cannot make a breaking change of this magnitude.

Please correct me if I'm wrong, but isn't this the first time the WG has sought the Privacy IG's review of this spec?

chcunningham commented 3 years ago

You are correct, this is the first time review has been requested. I accept responsibility for the delay in making the request. This spec was my first time navigating the w3c process.

My aim in the comment about "feasibility" is to provide important background. I'm happy to continue discussion on the merits of various designs.

pes10k commented 3 years ago

Just to second @samuelweiler (twice):

I think the substance of Sam's issue is important, given that for some users the values here will be highly identifying for browser fingerprinting (and if the approach Sam is suggesting isn't workable, then other fingerprinting protections are needed in the spec. I appreciate and agree with Sam that the text discussing fingerprinting issues is great, but the spec also needs normative protections against the fingerprinting risk)

I think the process points I read in Sam's comment are important too. The purpose of reviews is to identify privacy risks in specs, and make sure they're addressed before things move to REC. Doubly so when the spec touches on topics called out as needing extra care by TAG Design Principals. I see Sam identifying a place where the current spec doesn't seem to follow the least-power principal the TAG suggests (or align with the fingerprinting risks PING is generally concerned with).

@chcunningham are you saying that the WG isn't interested in moving the spec in a direction more in line with TAG guidance (and reducing fingerprinting risk)? Or that a capability navigation approach (or something else more in line with the TAG principals) sounds good, but would need to be achieved in a different way that has been discussed in the thread so far? Or that its simply too late to make any significant changes (in this respect) at all?

chcunningham commented 3 years ago

@chcunningham are you saying that the WG isn't interested in moving the spec in a direction more in line with TAG guidance (and reducing fingerprinting risk)?

No. I am happy to make changes that reduce fingerprinting risk. I think being transparent about feasibility of proposed changes is essential to having a good faith conversation about making improvements.

Or that a capability navigation approach (or something else more in line with the TAG principals) sounds good, but would need to be achieved in a different way that has been discussed in the thread so far?

Feasibility aside, I do not think the capability approach sounds good. I gave a few examples of issues in my earlier comments. IMO those examples demonstrate that the current API does align with the least power principal (more power was needed).

chcunningham commented 3 years ago

given that for some users the values here will be highly identifying for browser fingerprinting

Can you elaborate on this? I'd like to explore other mitigations.

mwatson2 commented 3 years ago

@samuelweiler wrote:

Couldn't we instead have sites expose available media formats and have browsers (perhaps in a way not exposed the application) pick the one they like best?

At least in our system (Netflix) and I imagine in others, the available media formats varies significantly by title and requires a small amount of work server-side to compute. In our case, also, computing the CDN URLs for the various streams involves a larger amount of server-side work. At the moment we do these tasks in a single network request. These tasks can be done speculatively to a certain extent (when there are signals as to which title might be presented next), but we would not want to waste resources on the CDN calculations for stream formats that are not supported by the device. If they cannot be done speculatively, then doing them in a single request is desirable from a responsiveness point of view, rather than one request to get the available formats and another to get the URLs and other metadata for the chosen format.

If I understand correctly, an API that allows the browser to choose a format from a provided list exposes all the same information about device capabilities, since it could be called repeatedly with different lists, so the privacy advantage of that approach is only that those requests could be rate limited and abuse might then be easier to detect. A site like Netflix, though, would need to call this frequently at first as we drive speculative preparation for titles visible in the gallery, for example, so heavy throttling could have a user experience impact and differentiating between normal usage and abuse may not be so easy anyway.

It should always be possible for privacy-sensitive browsers to monitor whether sites request capability information (in general, not just this API) and then do not go on to use the capabilities detected. Browsers can also choose to advertise only a common baseline capability and offer users the choice to expose more information only when a site actually uses that capability.

chrisn commented 10 months ago

Discussed in Media WG meeting 12 December 2023 (minutes). Next step: update our privacy considerations.

mwatson2 commented 10 months ago

Sorry, I missed the discussion last month. Happy to help draft text for the streaming case, based on the note above. I can prepare a PR if no one else is doing it.

chrisn commented 10 months ago

@mwatson2 Thank you, that would be really helpful. It sounds like a good way to start, then we can also add the WebRTC considerations from @aboba.

w3c / media-capabilities

General approach to capability negotiation #176