Open jgraham opened 1 month ago
A review by the Privacy Interest Group (PING) in 2020 tightened the spec to only reveal absence of camera or microphone to all sites.
Does this also expose the absence/presence of an audio output device? If chromium based browsers were to stop exposing the presence of an audiooutput device, this would break quality of service checks and therefore block everyone from the entire video call flow on software I used to work on and I suspect many other similar applications.
Thanks for filing this issue. We agree that WebRTC's distributed nature sometimes makes in-the-wild problems difficult to reproduce. However, this issue is scoped to the enumerateDevices API, which is a strictly local API where issues are much easier to reproduce.
With regards to the past fingerprinting issues of enumerateDevices, Chrome has implemented all the PING-driven measures that were added to the spec to address those problems, except the requirement to gate enumerateDevices results on active capture. Chrome optimistically agreed to this measure, but unfortunately it broke existing applications when we tried to deploy it. Chrome instead gates enumerateDevices results on camera and microphone permissions, which we believe are sufficient to address the fingerprinting issues.
Another enumerateDevices issue is support for the exposure of non-miked audio output devices when microphone exposure is allowed. This is necessary to support the common use case of a laptop user who uses non-miked headphones and the laptop microphone in VC calls. So we have filed https://github.com/w3c/mediacapture-main/issues/1019 and https://github.com/w3c/mediacapture-output/issues/147, which propose to change the content of the corresponding enumerateDevices tests, and so they should be resolved before adding these tests to the Interop list. We don't think either of those issues will regress the fingerprinting protection added by the overall PING-driven changes.
A review by the Privacy Interest Group (PING) in 2020 tightened the spec to only reveal absence of camera or microphone to all sites.
Does this also expose the absence/presence of an audio output device?
Hi @lukewarlow, it's unclear whether you're asking about the PING review or Interop 2025.
Audio output devices exposed through microphone permission appear alongside microphones, so they were categorically included in the PING review.
But audio output is covered by separate WPT tests https://wpt.fyi/results/audio-output not presently included in this proposal for Interop 2025. I'm fine with leaving it that way to have time to resolve the details in https://github.com/w3c/mediacapture-output/issues/147 (which is about what set of speakers the microphone permission exception applies to).
If chromium based browsers were to stop exposing the presence of an audiooutput device, this would break quality of service checks and therefore block everyone from the entire video call flow on software I used to work on and I suspect many other similar applications.
Does this software and the other "similar applications" work in Firefox and Safari? If not, they seem an example of the problem Interop 2025 exists to solve.
Does this software and the other "similar applications" work in Firefox and Safari? If not, they seem an example of the problem Interop 2025 exists to solve.
I can only speak directly to the application that I worked on but that currently uses user agent detection to skip the audio output check (assume it passes) in Firefox and Safari. Hence it's only in this case a web compat risk for chromium to stop exposing them, rather than a direct interop issue. Though the status quo can lead to situations where firefox and WebKit users that don't have speakers have a degraded experience (can join a video call that would never work because they have no audio output).
Having said all that assuming this interop proposal only covers microphone and camera and makes no decision one way or the other for audio output then that's okay for now but my concerns would apply to any decisions made about audio output exposure in future.
Having said all that assuming this interop proposal only covers microphone and camera and makes no decision one way or the other for audio output then that's okay for now but my concerns would apply to any decisions made about audio output exposure in future.
To satisfy the proposal, browsers would need to limit microphone exposure ahead of gUM success. I can't speak to how an implementation that is already out of spec on the matter of speakers would treat speakers.
Chrome optimistically agreed to this measure, but unfortunately it broke existing applications when we tried to deploy it.
That was 4 years ago. Today, both Safari and Firefox ship this measure. With those browsers already having taken the compatibility hit, what is holding Chrome back from fixing crbug 40138537 today? What websites do you remain concerned would still break? Wouldn't they only work in chromium browsers? Isn't that problematic?
So far we've encountered zero breakage from Firefox 132 implementing this measure.
This isn't just about fingerprinting — which remains an issue: having scanned a QR code on a website last year shouldn't opt people into tracking — it's also about interoperability across permission models.
What Chrome has implemented was never in the spec: near zero exposure without permission and full exposure with. The old spec had allowances that let websites build device pickers (albeit without labels) ahead of permission that sorta worked in all browsers.
In contrast, what Chrome has implemented unilaterally has created a huge interoperability issue for other browsers where websites can reliably implement a device chooser on pageload (after priming for permission just once) only in chromium browsers.
Maybe it's time to try again?
We should continue this discussion in https://github.com/w3c/mediacapture-main/issues/1019, where I'll leave a full reply, since this issue is just about the interop request, and Chrome has nothing to add about that request. The Interop guide explicitly says that Interop is not a venue for performing standards work.
However, for people not interested in following https://github.com/w3c/mediacapture-main/issues/1019, I just want to clarify here that Chrome did not unilaterally implement anything, much less something that was never in the spec.
The actual story is:
As you can see, there is nothing unilateral here.
As for the "device pickers (albeit without labels) ahead of permission that sorta worked in all browsers", no one ever built those pickers because they would have been useless for users, with zero human-readable information. Nothing broke when it became impossible to create those theoretical pickers, as they never existed.
The Interop guide explicitly says that Interop is not a venue for performing standards work.
No standards work is being performed here. I'm asking what stands in the way of Chrome fixing a bug open for four years to adhere to the existing standard. The claim that websites would be broken seems unsupported given that Firefox and Safari have shipped this. No list of such websites has been produced.
- Chrome had to roll back gUM-before-eD because it broke existing applications. gUM-before-eD represented at best a marginal improvement in fingerprinting, so the breakage was not justified.
That was on October 5th, 2020.
- Chrome filed an issue based on this implementation experience, to be discussed in the WG.
That was on October 15th, 2024, four years later. (6 days after this interop proposal).
The actual timeline:
I don't think implementation experience that predates CR snapshot 3 years ago qualifies as new information for the WG to revisit this issue.
As for the "device pickers (albeit without labels) ahead of permission that sorta worked in all browsers", no one ever built those pickers because they would have been useless for users, with zero human-readable information.
The pickers required trial and error between "camera 1" and "camera 2", but worked across browsers (unlike what you suggested in the comment you retracted). Labels filled in after use. You might not have seen this being in Chrome.
But yes, poor usability interop and fingerprinting concerns are why the WG moved away from eD-before-gUM.
But the WG moved to gUM-before-eD, not eD-before-gUM-with-persistent-permission which Chrome invented.
The WG would never have moved to the latter as it is not interoperable (doesn't work in Safari or Firefox by default).
Nothing broke when it became impossible to create those theoretical pickers, as they never existed.
So which websites still exist that rely on eD-before-gUM?
The Interop guide explicitly says that Interop is not a venue for performing standards work.
No standards work is being performed here. I'm asking what stands in the way of Chrome fixing a bug open for four years to adhere to the existing standard. The claim that websites would be broken seems unsupported given that Firefox and Safari have shipped this. No list of such websites has been produced.
Compatibility with existing applications. In your own words, by 2023 Most video conferencing sites offer a smoother user experience to returning Chrome users than to returning users in other browsers, because they basically ignore past non-persisted permissions entirely
Risk of breaking compatibility with Most video conferencing sites is what prevents Chrome from fixing that "bug". At this point we consider it a spec bug, not a Chrome bug, and we need to resolve the spec issue before working on interop.
- Chrome had to roll back gUM-before-eD because it broke existing applications. gUM-before-eD represented at best a marginal improvement in fingerprinting, so the breakage was not justified.
That was on October 5th, 2020.
Again, in your own words, Most video conferencing sites did code similar to
const perm = await navigator.permissions.query({name: "camera"});
if (perm.state == "prompt") {
nagTheUserAboutEnablingPermission();
}
You gave yourself the answer.
- Chrome filed an issue based on this implementation experience, to be discussed in the WG.
That was on October 15th, 2024, four years later. (6 days after this interop proposal).
Yes. When we analyzed this interop proposal we realized that we had this issue with gum-before-eD and noticed that we had not filed a spec issue about it. Then we filed one so that we could discuss it in the WG and reference it here.
The actual timeline:
- Chrome did NOT file an issue based on this implementation experience, to be discussed in the WG at the time
Just like Firefox did NOT implement the PING-driven changes for 5 years, until a couple of weeks ago.
- On October 13th 2021, Chrome WG members agree to publish a Candidate Recommendation Snapshot that includes the measure without objections
The thinking at the time was that there was no reason to block it because we thought could redeploy the change. However, as you so eloquently stated years later, Most video conferencing sites are still written against permissions-before-eD-labels.
- Safari implements the spec
- (time skip)
- Firefox implements the spec (after a year of outreach and compat work)
- On October 7th 2024, I inform you of our interop plans
- At a co-chair call I believe you verbally agreed you might be able to make this work given that Firefox has shipped
In the email thread we had with Youenn, I confirmed that we supported including "RTCRtpScriptTransform" and "RTCDataChannels transferable to workers". We had worked recently on these items and we did not have any blocking spec issue. In the co-chair meeting I told you that we might be able to support it given that Firefox finally implemented the main PING-driven measures, but did not give any assurances since we needed to double check things. When we checked the WPTs we noticed that gUM-before-eD is actually a blocker that needs more work in the WG because implementing it is a high compatibility risk for Chrome.
- On October 15th 2024, you filed What is the purpose of requiring a successful gUM call before enumerateDevices? w3c/mediacapture-main#1019
I don't think implementation experience that predates CR snapshot 3 years ago qualifies as new information for the WG to revisit this issue.
Why not? You confirmed that this was still a problem for Most video conferencing sites last year, and we have no reason to believe that the potential compatibility problems have gone away.
As for the "device pickers (albeit without labels) ahead of permission that sorta worked in all browsers", no one ever built those pickers because they would have been useless for users, with zero human-readable information.
The pickers required trial and error between "camera 1" and "camera 2", but worked across browsers (unlike what you suggested in the comment you retracted). Labels filled in after use. You might not have seen this being in Chrome.
No real VC application ever built those pickers because they would have been useless. If VC sites had done that, we would have seen major breakage when we implemented the PING-driven changes, and I do not recall a single bug filed related to that.
And "retract" is not the appropriate term. I just simplified the comment with the information that is most relevant for this venue. I will add the more detailed content in our conversation in the WG issue.
But yes, poor usability interop and fingerprinting concerns are why the WG moved away from eD-before-gUM.
What is eD-before-gUM? The model was permissions-before-eD-with-labels. And permissions often means a gUM call.
But the WG moved to gUM-before-eD, not eD-before-gUM-with-persistent-permission which Chrome invented.
First off, whether permissions are persistent or ephemeral is outside the spec. Each UA is free to implement permission persistence in any way it prefers.
The old spec had permissions-before-eD-with-labels. And Chrome reverted to it. That was and is still Safari's model. It' just that since Safari's permissions are ephemeral, permissions-before-eD-with-labels is exactly the same as gUM-before-eD-with-labels.
The WG would never have moved to the latter as it is not interoperable (doesn't work in Safari or Firefox by default).
It was and still is Safari's model. It's just that Safari's model is compatible with both the old and new specs. Chrome implemented the new spec, and it broke compatibility with, in your own words, Most video conferencing sites, so we had to revert to permissions-before-eD-with-labels as per the old spec.
Nothing broke when it became impossible to create those theoretical pickers, as they never existed.
So which websites still exist that rely on eD-before-gUM?
According to you (and also our observations), just last year it was Most video conferencing sites. I believe this is still the case. And the model is not eD-before-gUM. It is permissions-before-eD-with-labels.
If most sites have actually moved to gUM-before-eD-with-labels, it will work fine with Chrome too, so Chrome is not preventing sites to move to the current spec, but Chrome will not break compatibility to try to force the move.
I can only speak directly to the application that I worked on but that currently uses user agent detection to skip the audio output check (assume it passes) in Firefox and Safari. Hence it's only in this case a web compat risk for chromium to stop exposing them, rather than a direct interop issue. Though the status quo can lead to situations where firefox and WebKit users that don't have speakers have a degraded experience (can join a video call that would never work because they have no audio output).
Chrome has no plans to make incompatible changes in this area.
... in your own words, Most video conferencing sites did code similar to
const perm = await navigator.permissions.query({name: "camera"}); if (perm.state == "prompt") { nagTheUserAboutEnablingPermission(); }
Those would continue to detect Chrome's persistent permissions just fine. It's the correct way to query permission state.
But what does that have to do with device enumeration? 🤔
Is your concern over websites that (wrongly) use device enumeration to query permission state? Something like:
const [device] = await navigator.mediaDevices.enumerateDevices(); if (!device.label) { nagTheUserAboutEnablingPermission(); }
... in your own words, Most video conferencing sites did code similar to
const perm = await navigator.permissions.query({name: "camera"}); if (perm.state == "prompt") { nagTheUserAboutEnablingPermission(); }
Those would continue to detect Chrome's persistent permissions just fine. It's the correct way to query permission state.
But what does that have to do with device enumeration? 🤔
A slightly more complete code sketch for those applications is:
let perm = await navigator.permissions.query({name: "camera"});
if (perm.state == "prompt") {
// includes calling getUserMedia() and updating perm with the new permission state after the nag
nagTheUserAboutEnablingPermission();
}
if (perm.state == "granted") {
showFullApplicationIncludingPickers(); // includes calling eD()
}
The case that you refer to as "smoother user experience to returning Chrome users" is when the permission has been persisted and the application goes directly to showFullApplicationIncludingPickers()
without nagging the user. It is a very important use case for Chrome users and Chrome does not intend to break it.
gUM-before-eD-with-labels breaks this use case, as showFullApplicationIncludingPickers()
would show broken pickers.
Is your concern over websites that (wrongly) use device enumeration to query permission state? Something like:
const [device] = await navigator.mediaDevices.enumerateDevices(); if (!device.label) { nagTheUserAboutEnablingPermission(); }
No, that is of no concern.
The case that you refer to as "smoother user experience to returning Chrome users" is when the permission has been persisted and the application goes directly to showFullApplicationIncludingPickers() without nagging the user.
You're mischaracterizing https://github.com/w3c/mediacapture-main/issues/928, where "smoother user experience" refers to avoiding an extra click on a permission priming page ahead every meeting. That issue was not about eD-before-gUM.
Firefox solved that issue in 132 by returning "granted"
from query for users who've given one-time permission in the past. Because users who favor one-time permission should be respected and not treated like first-time visitors every meeting.
The old spec had permissions-before-eD-with-labels. And Chrome reverted to it. That was and is still Safari's model. It' just that since Safari's permissions are ephemeral, permissions-before-eD-with-labels is exactly the same as gUM-before-eD-with-labels.
False Equivalence. Safari and Firefox follow spec, Chrome does not.
showFullApplicationIncludingPickers(); // includes calling eD()
This is the eD-before-gUM use case which the WG abandoned in 2020. The old spec allowed this (with labels in some browsers and initially without in others depending on permission). The spec since then does not.
So which websites still exist that rely on eD-before-gUM?
According to you (and also our observations), just last year it was Most video conferencing sites. I believe this is still the case.
I wasn't talking about eD-before-gUM in that issue (Whereby.com, the example in that issue, does not do eD-before-gUM). In hindsight I shouldn't even have said "most" as that issue turned out to be a lot smaller than expected.
We've also worked with different services over the last year leading up to 132.
I've tried all the major services, and haven't run into any problems yet. Most seem to have a lobby with a comb-check, and turn the camera on if they can.
For there to be a problem, a video conferencing website would need to drop users into a meeting without camera and microphone on (e.g. based on a previous setting or maybe size of meeting), even though the user has granted persistent permission.
This might be plausible, but seems a minor inconvenience.
If you still think this is an issue on "Most video conferencing sites", can you give an example?
You're mischaracterizing w3c/mediacapture-main#928, where "smoother user experience" refers to avoiding an extra click on a permission priming page ahead every meeting. That issue was not about eD-before-gUM.
Firefox solved that issue in 132 by returning
"granted"
from query for users who've given one-time permission in the past. Because users who favor one-time permission should be respected and not treated like first-time visitors every meeting.
If the application produces broken pickers for returning users, that is not a smooth user experience for returning users. It is in fact a broken user experience and is a blocker for Chrome.
The old spec had permissions-before-eD-with-labels. And Chrome reverted to it. That was and is still Safari's model. It' just that since Safari's permissions are ephemeral, permissions-before-eD-with-labels is exactly the same as gUM-before-eD-with-labels.
False Equivalence. Safari and Firefox follow spec, Chrome does not.
I'm not making any equivalence. I'm saying the old spec had permissions-before-eD-with-labels and Chrome reverted back to that model after gUM-before-eD-with-labels broke applications. You were claiming that Chrome invented some new model, which is an incorrect statement. Chrome simply could not migrate from permissions-before-eD-with-labels to gUM-before-eD-with-labels because it breaks applications. I also pointed out that Safari's model is the same as the old spec because it uses ephemeral permissions.
I've tried all the major services, and haven't run into any problems yet. Most seem to have a lobby with a comb-check, and turn the camera on if they can.
I just tried Zoom as a returning user on Firefox to see if I could get a smooth user experience. I first tried joining a meeting to which I was invited and Zoom's default behavior was to join the meeting with the camera off and the UI showed broken pickers. Then I tried to start a meeting with "Host with Video off", which is a prominent feature in Zoom, and again the UI showed broken pickers. This is exactly the breakage Chrome wants to avoid. Supporting these Zoom use cases correctly is very important for Chrome.
For there to be a problem, a video conferencing website would need to drop users into a meeting without camera and microphone on (e.g. based on a previous setting or maybe size of meeting), even though the user has granted persistent permission.
This might be plausible, but seems a minor inconvenience.
This is not just plausible, but a common use case with Zoom, broken by gum-before-eD-with-labels. Chrome does not consider it a minor inconvenience, but an unacceptable regression.
If you still think this is an issue on "Most video conferencing sites", can you give an example?
Zoom.
You mean like this?
Oh wait, that's Chrome.
Not as a returning user. If you have never given permission to share cameras/microphones it is correct that the application cannot access your cameras/microphones.
Select "Allow on every visit" on that dialog in your Chrome screenshot, and select "Remember for all cameras and microphones" for the equivalent dialog on Firefox so that both browsers provide a promptless experience for returning users. Close the browser, reopen it, and go to Zoom again. You'll see the breakage only in Firefox.
You're mischaracterizing https://github.com/w3c/mediacapture-main/issues/928, where "smoother user experience" refers to avoiding an extra click on a permission priming page ahead every meeting. That issue was not about eD-before-gUM.
Firefox solved that issue in 132 by returning "granted" from query for users who've given one-time permission in the past. Because users who favor one-time permission should be respected and not treated like first-time visitors every meeting.
Whereby still requires an extra click on a priming page on Firefox. The step is for the gUM call prior to enumerateDevices, so that Whereby can create a proper UI with functional pickers. So, despite the fix in the permissions API, returning users still don't get the "smoother user experience".
At this point I think it is clear that gUM-before-eD-with-labels introduces serious compatibility problems with existing applications and is not ready to be considered for Interop. I propose that we continue the discussion in https://github.com/w3c/mediacapture-main/issues/1019, where I listed even more examples of breakage.
Not as a returning user.
Yes, as a returning user. Did you miss the "Allow this time" option?
If you have never given permission to share cameras/microphones it is correct that the application cannot access your cameras/microphones.
That is false. Select "Allow this time" on that dialog in the Chrome screenshot. Close the tab, open a new one, and go to Zoom again. You'll see the "breakage" in Chrome again.
You're falsely assuming every "returning user" is using persistent permission. What you call "breakage" is normal behavior for many users and they haven't complained.
It's not even technically "breakage" because the websites are handling it, substituting an empty string for "Unrecognized microphone1" etc.
... and select "Remember for all cameras and microphones" for the equivalent dialog on Firefox so that both browsers provide a promptless experience for returning users.
These arguments seem to rest on false equivalence: returning users shouldn't have to give up privacy by escalating their trust in the browser to be considered "returning users".
Not as a returning user.
Yes, as a returning user. Did you miss the "Allow this time" option?
I mean as a returning user that gave persistent permissions in order to have a smoother experience. It is obvious that if you only give permission for a single session, there is no expectation that you'll enjoy the smoother, promptless experience since you'll have to give permission again. That is what you are seeing, and it is a different use case than the one with persistent permissions.
If you have never given permission to share cameras/microphones it is correct that the application cannot access your cameras/microphones.
That is false. Select "Allow this time" on that dialog in the Chrome screenshot. Close the tab, open a new one, and go to Zoom again. You'll see the "breakage" in Chrome again.
That is because you chose not to persist the permission. There is no expectation of a smoother experience in this case. The behavior should be the same as in Safari in this case.
You're falsely assuming every "returning user" is using persistent permission. What you call "breakage" is normal behavior for many users and they haven't complained.
The use case that breaks is the one of a user who gives a persistent permission and has an expectation of a smoother experience without prompts and with the UI working correctly. This use case breaks with gUM-before-eD and is a blocker for Chrome.
It's not even technically "breakage" because the websites are handling it, substituting an empty string for "Unrecognized microphone1" etc.
It's broken because users have no idea what camera or microphone to select. Choosing the wrong microphone can be a serious privacy issue. And it's a major regression if we make an update that breaks this important use case. Chrome already tried it and had to roll it back.
... and select "Remember for all cameras and microphones" for the equivalent dialog on Firefox so that both browsers provide a promptless experience for returning users.
These argument seems to rest on characterization and rhetoric: returning users shouldn't have to give up privacy by escalating their their trust in the browser to be considered "returning users".
I'm using the term "returning users" only because you introduced it in this thread.
The actual use case is a user that trusts the application and gives persistent permissions so that they can enjoy a smoother experience without prompts. Part of the smoother experience is to have functional pickers that allow the user to select the right microphone or camera without being prompted again for permission.
This use case is supported by the old spec and works fine on Chrome with applications like Zoom and Whereby.
If your opinion is that this use case shouldn't be supported, then we have a disagreement and we should discuss it in the WG.
Also, users are not giving up any privacy. They are giving persistent permissions to an application they trust. gUM-before-eD introduces privacy issues too. Consider the use case of user that trusts the application, gives it persistent permissions to avoid prompts, and configures the application to start/join meetings with the mic off. The only way to provide a proper UI with functional pickers is to open the microphone first, violating the user's privacy. The alternative is to provide a broken picker, but then the user can't select the right microphone, again with potential negative consequences for the user's privacy.
So the argument of gUM-before-eD being categorically better for privacy is incorrect.
Choosing the wrong microphone can be a serious privacy issue.
If it's a serious privacy issue, it should be solved for all users, not just those who persist permission.
This seems better left to apps to solve.
Zoom gives no indication of which camera or microphone is used except for users who hit the little ^
button, so it doesn't seem that important to them.
If they thought this was an issue, Zoom could simply call gUM when the user changes camera instead of waiting until the user unmutes themselves in the meeting (like they do for microphone). Problem solved for all users in all browsers.
getUserMedia() also accepts cached deviceIds from localStorage, which apps can use to remember user settings to ensure users aren't surprised by which camera is used. This also works for all users.
If it's a serious privacy issue, it should be solved for all users, not just those who persist permission.
The problem would exist only for those who persist permission.
If there are no permissions (because they are not persistent), the application cannot open any device, correct or wrong, without prompting the user first. Once the user is prompted and approves, they will have access to the device information and be able to select the device they want to use.
This seems better left to apps to solve.
It is impossible to properly support the use case (promptless experience for users who trust the application) with gUM-before-eD.
Zoom gives no indication of which camera or microphone is used except for users who hit the little ^ button, so it doesn't seem that important to them. If they thought this was an issue, Zoom could simply call gUM when the user changes camera instead of waiting until the user unmutes themselves in the meeting (like they do for microphone). Problem solved for all users in all browsers. getUserMedia() also accepts cached deviceIds from localStorage, which apps can use to remember user settings to ensure users aren't surprised by which camera is used. This also works for all users.
I can't speak for Zoom developers or read their minds, so I don't know if this problem in Firefox is a high priority for them. All I can say is that this problem does not exist in Chrome, and Chrome have no plans to introduce it.
... Once the user is prompted and approves, they will have access to the device information and be able to select the device they want to use.
Suggesting users without persistent permissions can easily select the correct device post-gUM, but users with persistent permissions can't, is a false choice fallacy.
Websites don't need separate code paths for different permission states. Take the Zoom example you raised. The issue of users potentially choosing the wrong device exists equally for all users ahead of gUM (not just users with persistent permission):
A prompt from gUM (in Chrome or Safari) doesn't solve the problem of a device being incorrect—it simply grants access. After approval, users don't automatically get to select their preferred device without additional steps. These additional steps can be added whether gUM triggered a prompt or not.
If selecting the wrong device is a serious privacy concern, it should be addressed consistently for all users, not just those with persistent permissions.
Once the application has called gUM, it will be able to list device information and let users select the device they want to use. This is "promptless" with persistent permission.
Risk of breaking compatibility with Most video conferencing sites is what prevents Chrome from fixing that "bug".
It's not "most video conferencing", because they have lobbies so it's quite hard to get into a meeting without microphone (so they can implement "are you talking?") and camera (for self-view comb-check in the lobby).
Zoom should be commended as an outlier for letting users in without accessing camera. This is great for users of one-time permission. I hope more websites adopt this model going forward. But it will be imperative to solve problems for all users of those sites.
The spec is trying to not give preferential treatment to one set of users. The goal of persistent permission is to avoid prompts, not enable special device selection use flows that only work for half the users of one browser.
... Once the user is prompted and approves, they will have access to the device information and be able to select the device they want to use.
Suggesting users without persistent permissions can easily select the correct device post-gUM, but users with persistent permissions can't, is a false choice fallacy.
Applications shouldn't call gUM if their users' preference is to keep the microphone and/or camera off. My opinion is that Zoom is doing the right thing by respecting the users' privacy setting, even if it means a degraded UI for Firefox users with persistent permissions. There are other applications that call gUM even when the user preference is to keep devices off, presumably not with the intent of violating the user's privacy setting, but because it's the only way to provide a functional UI to all Firefox users. They would normally close the resulting tracks a quickly as possible, but just opening the device against the user's preference is a problem, even if it's for a short time.
Applications should be able to implement this user journey without having to choose between a broken UI or violating users' privacy settings. gUM-before-eD makes this impossible.
Websites don't need separate code paths for different permission states.
That's up to Web site authors. For years Web sites have been able to provide a smoother user experience to users who trust to site. We shouldn't remove this choice just because we want to force gUM-before-eD.
Take the Zoom example you raised. The issue of users potentially choosing the wrong device exists equally for all users ahead of gUM (not just users with persistent permission):
Users with persistent permission on Chrome can choose the correct device because device information is available ahead of gUM for them.
A prompt from gUM (in Chrome or Safari) doesn't solve the problem of a device being incorrect—it simply grants access. After approval, users don't automatically get to select their preferred device without additional steps. These additional steps can be added whether gUM triggered a prompt or not.
You are describing the use case of users that have not given persistent permission (a prompt from gUM). The experience is less smooth for them, and that is expected. This use case is not broken by gum-before-eD.
If selecting the wrong device is a serious privacy concern, it should be addressed consistently for all users, not just those with persistent permissions.
The problem doesn't exist in practice for users without persistent permissions since they are prompted whenever a device is going to be opened.
Once the application has called gUM, it will be able to list device information and let users select the device they want to use. This is "promptless" with persistent permission.
Calling gUM when the user has told the application not to call gUM (i.e., keep the camera/mic off) would be a violation of the user's privacy settings. gUM-before-eD forces applications to do this if they want to show a correct UI for users with persisteent permissions, or it forces applications to show an incorrect UI if they want to respect the user's preference.
permission-before-eD allows application to implement a correct UI without violating the user's privacy settings.
Risk of breaking compatibility with Most video conferencing sites is what prevents Chrome from fixing that "bug".
It's not "most video conferencing", because they have lobbies so it's quite hard to get into a meeting without microphone (so they can implement "are you talking?") and camera (for self-view comb-check in the lobby).
It doesn't have to be "most" for the change to negatively affect many users.
Zoom should be commended as an outlier for letting users in without accessing camera. This is great for users of one-time permission. I hope more websites adopt this model going forward. But it will be imperative to solve problems for all users of those sites.
I agree 100% that Zoom, or any other site, should be able to support the use case of not accessing the camera, including for users who trust the site with persistent permissions, without being forced to open the camera to provide a correct UI.
The spec is trying to not give preferential treatment to one set of users. The goal of persistent permission is to avoid prompts, not enable special device selection use flows that only work for half the users of one browser.
I am not sure the spec ever intended to forbid the use case of applications providing a smoother experience to users who persist permissions, but the case is that the old spec allows it and many sites provide it.
Also, I don't think this discussion is productive anymore since we've reached a point in which we're just repeating the same arguments multiple times.
Applications shouldn't call gUM if their users' preference is to keep the microphone and/or camera off.
There's no such rule or preference. The user's preference in this case is whether others see/hear them initially when they join a meeting.
Such a join muted preference doesn't preclude a dedicated video conferencing website from turning on the user's devices locally (in response to user actions or maybe even without) to satisfy a myriad of use cases, like
This reality is what separates a dedicated video conferencing site from a non-conferencing website that only casually obtained camera/mic access years ago to snap a profile picture when users created their account, and has been able to use this permission to track them ever since. Having to call gUM was meant as a deterrent to the latter, not the former.
Description
WebRTC is one of the most significant compatibility challenges on the modern web, and Mozilla's experience is that implementation differences in this area are a leading cause of breakage on top sites. The nature of the use cases means that in-the-wild problems are usually very difficult to reproduce, and therefore debug and fix reactively, so this area has even higher than normal dependence on good up-front interoperability.
The navigator.mediaDevices.enumerateDevices() API allows websites unprompted access to information about a user's cameras, microphones and speakers, which is a fingerprinting surface. The API is called by 7.3% of the web, compared with 0.2% for
getUserMediaPromise
which suggests this API is used extensively for tracking.A review by the Privacy Interest Group (PING) in 2020 tightened the spec to only reveal absence of camera or microphone to all sites, and to require active camera and microphone access (not just permission) for anything else.
Tests
https://wpt.fyi/results/mediacapture-streams?q=enumeratedevices
Specification
https://w3c.github.io/mediacapture-main/ https://w3c.github.io/mediacapture-output/
Additional Signals
No response