w3c / webrtc-stats

WebRTC Statistics
https://w3c.github.io/webrtc-stats/
Other
128 stars 46 forks source link

The HW exposure check does not solve Cloud Gaming use cases #730

Closed henbos closed 1 year ago

henbos commented 1 year ago

After some back and forth, the current HW exposure check only considers contexts that have capture (getUserMedia). This obviously does not work for Cloud Gaming.

What's even more frustrating is that MediaCapabilities already expose the HW capability information, but it does not expose what is currently used (e.g. powerEfficientDecoder or decoderFallback). The specs are inconsistent in how much they care about exposure information. The MC privacy consideration section says that there is very little information exposed so it is not a big problem, but that a user agent might want to throttle or give vague answers, which is very much up for interpretation. Why would getStats() be less restrictive than MC?

Possible paths forward:

  1. Declare decoderFallback a weak fingerprinting vector and skip the HW exposure check for that particular metric (but not other metrics like powerEfficientDecoder).
  2. Reference MediaCapablities.powerEfficient. Even if the steps are something vague like "let the UA decide", we would at least have something to reference, and we could say that "if the MC check passes, return true" in our own HW exposure algorithm.
  3. Other suggestions welcome?

In reality though, because MC is currently up to the UA which in practise appears to be implemented as "always expose", doing 2) would basically turn off all finger printing protection from getStats(). But at least the two specs would be consistent and we would have a placeholder algorithm (in MC spec) to update when necessary.

Let's discuss

henbos commented 1 year ago

@pes10k @alvestrand @vr000m @xingri @Diego-Perez-Botero

henbos commented 1 year ago

MC is currently up to the UA which in practise appears to be implemented as "always expose"

@drkron please comment if this was a fair characterization of MediaCapabilities HW exposure checks or if we do in fact take more steps to prevent leaking HW.

drkron commented 1 year ago

MC is currently up to the UA which in practise appears to be implemented as "always expose"

@drkron please comment if this was a fair characterization of MediaCapabilities HW exposure checks or if we do in fact take more steps to prevent leaking HW.

That seems like a fair characterization. From what I remember, the powerEfficient bit alone has not been considered to have privacy implications. The browser may in certain cases choose to set powerEfficient=true even though there's no HW support (e.g., low resolutions may for example be considered to be powerEfficient without HW support).

fippo commented 1 year ago

See https://w3c.github.io/webrtc-nv-use-cases/#game-streaming for the description of the use-case.

Note that attacks requiring a live connection like heuristics about decode time can be used as Diego pointed out. So far, we haven't seen such attacks though so I do not think we are gaining much by gating decoderFallback.

For cloud gaming the gamepad API could be taken into account as well but I assume this is the same single-digit percentage solution as microphone.

We should still take fullscreen (removed in #712) into account since it will prevent some fingerprint but maybe need different levels of granularity?

Moving forward I'd rather go the opposite direction and have more information in encoderImplementation/decoderImplementation if getUserMedia et al give a strong signal.

henbos commented 1 year ago

For cloud gaming the gamepad API could be taken into account as well but I assume this is the same single-digit percentage solution as microphone.

That was suggested before and it would improve the situation, but I feel like it has the same problem as the GUM check, it restricts to certain devices... but you could have cloud gaming with a keyboard. It would seem rather arbitrary if you can have good HW usage if you plug in a controller but not if you use a keyboard.

IsFullscreen is also problematic

vr000m commented 1 year ago

I think going with a variation to option 2 in the original thread makes sense, i.e., if the mediacapabilities reveals the hw capabilities, we can piggyback that information

alvestrand commented 1 year ago

Suggestion: If the MediaCapabilities already shows a defined value of PowerEfficient, then all the fingerprint value of PowerEfficientEncoder / Decoder is already out there. The PowerEfficientEncoder is only adding information about later events on the system, which is not that useful for fingerprinting. (The check should be against what MC would return if asked, not about whether or not it has been exposed.)

alvestrand commented 1 year ago

So instead of guarding this with HardwareAllowed, guard it with MediaCapabilities.PowerEfficient.

henbos commented 1 year ago

Makes sense to just reference MC.powerEfficient, the intent should be clear

pes10k commented 1 year ago

Hi all, I don't know how overlapping the repo's/specs are so i wanted to point to my comment in the webrtc-stats spec that covers the same concerns as this issue

aboba commented 1 year ago

Am proposing to expose this info in WebCodecs via an event: https://github.com/w3c/webcodecs/pull/645

pes10k commented 1 year ago

I see there are two parallel but related conversations going on. I've been mostly commenting on the PING/privacy review side of things at https://github.com/w3c/webrtc-stats/pull/732, but if it'd be helpful im also happy to discuss here too / instead

xingri commented 1 year ago

I see there are two parallel but related conversations going on. I've been mostly commenting on the PING/privacy review side of things at #732, but if it'd be helpful im also happy to discuss here too / instead

@pes10k Thank you for the response. I agree this might be more proper place to discuss the alternatives. Regarding the option 1, we had a discussion on https://github.com/w3c/webrtc-stats/pull/725 previously.

pes10k commented 1 year ago

thank you for the link @xingri. Just to make sure i understand correctly, option1 then would leave powerEfficientDecoder as is (i.e. hardware check to access), but would add another property called decoderFallback that would not need hardware check?

if decoderFallback is (always or almost always) just the opposite of powerEfficientDecoder, im not sure thats a privacy improvement.

I think the approach in the spec of "group the fingerprinting relevant inputs together and require some high-touch event before the page can load them" is probably the right way to go. if cloud gaming folks need a way to get access to these stats that isn't getUserMedia() related, figuring out how to define that high-touch event might be the best way forward (for my 2c)

xingri commented 1 year ago

thank you for the link @xingri. Just to make sure i understand correctly, option1 then would leave powerEfficientDecoder as is (i.e. hardware check to access), but would add another property called decoderFallback that would not need hardware check?

That is correct.

if decoderFallback is (always or almost always) just the opposite of powerEfficientDecoder, im not sure thats a privacy improvement.

Basically, I agree with you that it can be considered equally. However, based on my understanding on the Finger Printing protection, it is to protect the UserAgent identification. If that is correct, decoder fallback couldn't be used to identify the User Agent since the decoder fallback is only introduced from possible abnormality of the system and can be revoked when the system recovered. I do not think this can be used a finger printing because of that characteristics. So it gives advantage compare to the power efficiency metric.

I think the approach in the spec of "group the fingerprinting relevant inputs together and require some high-touch event before the page can load them" is probably the right way to go. if cloud gaming folks need a way to get access to these stats that isn't getUserMedia() related, figuring out how to define that high-touch event might be the best way forward (for my 2c)

Will investigate it.

xingri commented 1 year ago

@pes10k Could you please review my feedback on the decoderFallback approach from the FP protection perspective once again?

Also could you share a little more detail about how to define the "high-touch event"? I have been looking for the information but I couldn't find the exact details so far.

pes10k commented 1 year ago

@pes10k Could you please review my feedback on the decoderFallback approach from the FP protection perspective once again?

Ah i see, can you say more about the kinds of cases that would cause decoderFallback to be true? I thought decoderFallback just meant "software renderer", which could be true for all sorts of persistant reasons (the device doesn't have relevant hardware, or the OS has been upgraded/ changed and so the hardware isn't supported, etc). Could you give some other examples of when decoderFallback would be true, and which would be temporary vs semi-persistant?

Also could you share a little more detail about how to define the "high-touch event"?

Sorry, I didn't mean to mislead you, this isn't a well defined term. I just mean "some high level of interaction or user intent". The most common examples are dialogs and permission prompts, but those aren't the only ones

xingri commented 1 year ago

Ah i see, can you say more about the kinds of cases that would cause decoderFallback to be true? I thought decoderFallback just meant "software renderer", which could be true for all sorts of persistant reasons (the device doesn't have relevant hardware, or the OS has been upgraded/ changed and so the hardware isn't supported, etc). Could you give some other examples of when decoderFallback would be true, and which would be temporary vs semi-persistant?

@pes10k Thanks you for the response. To share the transitions of this metric, it may look like following:

FYI, in case of the Chromium browser case, the fallback gets triggered by the following reasons:

enum class RTCVideoDecoderFallbackReason {
  kSpatialLayers = 0,
  kConsecutivePendingBufferOverflow = 1,
  kReinitializationFailed = 2,
  kPreviousErrorOnDecode = 3,
  kPreviousErrorOnRegisterCallback = 4,
  kConsecutivePendingBufferOverflowDuringInit = 5,
  kMaxValue = kConsecutivePendingBufferOverflowDuringInit,
};
xingri commented 1 year ago

@pes10k Could you share your opinion on the decoderFallback metric approach? Can we move forward option 1 from the FP perspective?

xingri commented 1 year ago

@pes10k It is a gentle reminder that I am waiting for your feedback. Please let me know if you need more clarifications on the decoderFallback metric.

pes10k commented 1 year ago

@xingri Im not familiar enough with this part of the code to know what all of those enum options mean, but if this means that decoderFallback would only depend on things that happened on the current site (such as the decoder failing previously during the session) then I think thats a fine, A+ approach.

If though the decoderFallback value is shared across sites or, or across browser sessions, then i think the fingerprinting risk is possibly still here.

If its the second case, then i'll have to read up more on the kinds of things that could cause decoderFallback to be true and get back to you (which i could do by Monday at the latest)

xingri commented 1 year ago

@pes10k Thank you so much for the feedback. Fortunatley, the decoderFallback is only effective on the current site during the session and not shared throughout the browser session.

xingri commented 1 year ago

@henbos By considering the feedback from @pes10k , could we move forward with the option 1? FYI, I have rebased the change on the https://github.com/w3c/webrtc-stats/pull/725.

henbos commented 1 year ago

If though the decoderFallback value is shared across sites or, or across browser sessions, then i think the fingerprinting risk is possibly still here.

The decoderFallback only reflects the current site's decoder usage and as such does not expose if other websites are experiencing fallback, only if the current site experiences fallback.

However, because one of the reasons that your decoder could fallback is that the HW is not available, if another website or application acquired the same HW before you did, this could result in decoderFallback when you try to instantiate your decoder (depending on HW limitations in number of concurrent instances).

How concerning is that to you @pes10k ?

xingri commented 1 year ago

However, because one of the reasons that your decoder could fallback is that the HW is not available, if another website or application acquired the same HW before you did, this could result in decoderFallback when you try to instantiate your decoder (depending on HW limitations in number of concurrent instances).

@henbos Thank you for the feedback, Based on my proposal, decoder fallback only triggerred when the decoder starts with hardware acceleration. Unless the session starts with hardware decoding, the decoder fallback will be not triggered.

henbos commented 1 year ago

Clever!

dontcallmedom-bot commented 1 year ago

This issue was discussed in the minutes of WebRTC March 2023 meeting ā€“ 21 March 2023 (Issue #730 The HW exposure check does not solve Cloud Gaming use cases)

dontcallmedom-bot commented 1 year ago

This issue was discussed in WebRTC Feb 2023 Meeting ā€“ (Issue #730 šŸŽžļøŽ)

henbos commented 1 year ago

Based on WG meetings where the desire to expose an event instead has been discussed, this is no longer being considered by the stats spec