w3c / mediacapture-main

Media Capture and Streams specification (aka getUserMedia)
https://w3c.github.io/mediacapture-main/
Other
125 stars 61 forks source link

Avoid circular definition of muted. #982

Open jan-ivar opened 11 months ago

jan-ivar commented 11 months ago

This definition is backwards: "If live samples are not made available to the MediaStreamTrack it is muted".

Mute causes lack of frames, not the other way around: If a MediaStreamTrack is muted, no live samples are made available to it.

All subsequent language and examples align with muted being an intentional User Agent initiated change:

image

Crucially, the "change" of state (not just the event) is initiated by the User Agent.

This has caused confusion in implementations. E.g. @youennf replied in https://github.com/w3c/mediacapture-extensions/issues/39#issuecomment-1824119912:

Thanks @guidou, this is really helpful info.

For camera tracks, Chrome just checks if frames have not been received for some time (25 expected frame intervals in most cases), regardless of the underlying reason. This maps well to the spec text that states If live samples are not made available to the MediaStreamTrack it is muted,

The spec allows it. I wonder though whether this model is actually helping web developers. For instance, is it better to have a black video or a frozen video when the track is sent via WebRTC?

In general, the value of an "event" is its intent, that something external happened. Therefore, synthesizing events reactively from symptoms seems a mistake. For example: crbug 941740 implements mute on remote tracks reactively based on (lack of) input, violating the WebRTC spec and causing web compat issues. Doing the same on capture tracks seems like a bug, and should be a violation of this spec, but is attributed to the aforementioned line in the spec.

The stats API that @henbos is working on could be more appropriate for web developers.

FWIW, in Safari, if we do not receive video frames/audio frames after a given amount of time, we fail the capture. We assume that something is wrong and that the web application be better restarting capture, maybe with a different device. Some web applications do this, not all of them sadly.

These browser differences are making developer's life difficult. I wonder whether this space is now mature enough that we could get browsers to share a more consistent model around muted and capture suspension/failure. @jan-ivar, how is Firefox using muted these days for capture tracks? Is Firefox sometimes failing capture?

Firefox fires mute as explained in the OP of https://github.com/w3c/mediacapture-extensions/issues/39#issue-1037935336 (behind a pref) but never reactively from symptoms.

Proposal:

Replace the confusing sentence with "If a MediaStreamTrack is muted, no live samples are made available to it."

guidou commented 11 months ago

The problem with "fixing" these spec definitions that have been in place for years to try to better solve today's problems is that it is extremely difficult to update implementations that have followed the old definition for years. Practically every time Chromium has tried to do that with other similar spec changes, the changes had to be reverted because the new behavior broke existing applications. We have had much better results introducing new and better APIs and removing the old one after applications migrate to the new one (srcObject and the Plan B/Unified Plan transitions are good examples of this). This type of redefinition is also very problematic for applications that need to support new and old browser versions simultaneously (common in enterprise environments).

guidou commented 11 months ago

I would oppose any spec changes to the muted attribute and the corresponding events. I'm OK with defining a new attribute or method with the new definition (e.g., call it isMuted) and removing the old one from the spec. This gives time to applications to migrate to the new version over time without causing abrupt compatibility problems. If we do that, then we can probably move the discussion to requestUnmute(). After all, what we were proposing was adding a bool to better understand if the cause of the mute was the one we want in the new definition or one of the causes allowed by the old definition but not the new one.

youennf commented 11 months ago

@guidou, I understand the concerns. Before diving in those concerns, I understand that there is a desire from Chrome to try moving towards this specific muted definition.

About the concerns, in this particular case, the change is about stopping to fire mute events in odd cases. How do you expect it to break existing websites? I would think that some UI might not be updated with the capture-does-not-work-properly, which is not great but not too bad either. And these websites would anyway need to be updated.

As of a new attribute, would it mean new event listeners? If so, this has a very high toll, to all browsers and all websites, this seems very complex.

Given audio/video stats API will allow to simulate these odd cases mute events, would it not be possible to advertise the use of JS polyfills for applications that would like to keep receiving these events? That way, shipping audio/video stats API and muted event migration guidelines could be sufficient.

I'd like to avoid introducing a boolean which definition would, from the start, mention that this is for legacy applications and that we plan to obsolete it.

youennf commented 11 months ago

Overall, I like this proposal. I wonder whether we should mention that UAs MAY end capture because of lack of samples for unknown reasons. If we go in that direction, we probably want to update image capture to not fire onmute/onunmute events.

jan-ivar commented 11 months ago

How do you expect it to break existing websites? I would think that some UI might not be updated with the capture-does-not-work-properly, which is not great but not too bad either. And these websites would anyway need to be updated.

Good question. Any apps treating mute as fatal would already fail to interoperate.

eladalon1983 commented 10 months ago

In general, the value of an "event" is its intent, that something external happened. Therefore, synthesizing events reactively from symptoms seems a mistake.

What can the user agent do on platforms where they get no advanced knowledge that frames will not be forthcoming?

I see great value in giving the application as much clarity as the user agent can muster. We live in a world of open source operating systems and browsers. Hundreds of millions of people use video-conferencing tools every day. The vendors of these VC applications have large engineering teams. Having clear metrics on where various issues lie, allows these engineers to set out and fix issues in codebases beyond their own - to everyone's benefit.

Specifying that user agents should not mute when they are not sure the issue is an explicit mute would be a step in the wrong direction. Having more fine-grained MuteReasons - as I had proposed elsewhere - would be a step in the right direction.

jan-ivar commented 10 months ago

In general, the value of an "event" is its intent, that something external happened. Therefore, synthesizing events reactively from symptoms seems a mistake.

What can the user agent do on platforms where they get no advanced knowledge that frames will not be forthcoming?

We define APIs based on developer needs, not user agent needs.

If the OS mutes, the user agent owns the problem of detecting that and conveying that as an "event" that happened. E.g. If the user agent has reason to believe lack of frames is instead due to an error, then ending the track may be more appropriate.

If the user agent cannot tell whether the OS muted it or whether there was an error, that is its problem to solve. Punting hard questions like this to the webapp doesn't seem reasonable to me.

The spec already defines muted and ended as separate events for this reason. Agreeing on these definitions is what we've committed to to having browsers interoperate.

the change is about stopping to fire mute events in odd cases. How do you expect it to break existing websites?

An answer to this question would be helpful.

eladalon1983 commented 10 months ago

We define APIs based on developer needs, not user agent needs.

Developers need to be able to debug their users issues. Even if those issues extend beyond the JS application. For a developer of a VC application, the user agent, the operating system, even the hardware - everything is in scope.

If the OS mutes, the user agent owns the problem of detecting that and conveying that as an "event" that happened.

As mentioned in my previous message, the user agent might not be able to understand why they are not receiving new frames. Issuing a mute event in such a case is both spec-compliant and useful (to developers).

If the user agent has reason to believe lack of frames is instead due to an error, then ending the track may be more appropriate.

Why frames are not arriving would not always be known.

If the user agent cannot tell whether the OS muted it or whether there was an error, that is its problem to solve.

Developers cannot afford to sit on their hands and pray that others would solve their problems. We live in a competitive world. He who solves their users problems promptly gains the prize of retaining his customers. Let's empower developers in their quest to serve our mutual users. ("Our mutual users" - shared by the browser and the Web app.)

If the user agent has reason to believe lack of frames is instead due to an error, then ending the track may be more appropriate. [...] The spec already defines muted and ended as separate events for this reason.

You gave an example where you believe ending is better than muting. Even if I agreed, for the sake of argument, that this was correct - what about all other cases? Allow me to quote my colleague Guido: "We need to solve all use cases that arise in practice, not just the simplest one."

dontcallmedom-bot commented 10 months ago

This issue was mentioned in WebRTC December 12 2023 meeting – 12 December 2023 (Solve user agent camera/microphone double-mute (mediacapture-extensions))

guidou commented 10 months ago

This definition is backwards: "If live samples are not made available to the MediaStreamTrack it is muted".

Mute causes lack of frames, not the other way around: If a MediaStreamTrack is muted, no live samples are made available to it.

All subsequent language and examples align with muted being an intentional User Agent initiated change:

Not all subsequent language align with muted being an intentional User Agent initiated change. In fact, I would argue that no language at all aligns with this. The word intentional does not appear anywhere in the spec. The list you refer to is presented as situations that "can be" reason to mute a track. Nowhere is it stated that any element in that list is actually a reason to mute, or even that it SHOULD cause mute. At best, it can be interpreted as a MAY. More importantly, Section 4.3.1.1 of the spec says:

The muted/unmuted state of a track reflects whether the source provides any media at this moment.

A MediaStreamTrack is muted when the source is temporarily unable to provide the track with data

And Section 8 says the mute event is fired when The MediaStreamTrack object's source is temporarily unable to provide data, and the unmute event is fired when A MediaStreamTrack has been removed from this stream. Note that this event is not fired when the script directly modifies the tracks of a MediaStream.

This makes it clear that the model is that muted means no media from the source to the track, and disabled means no data from the track to its consumers.

In general, the value of an "event" is its intent, that something external happened. Therefore, synthesizing events reactively from symptoms seems a mistake.

Maybe it was a mistake that the spec defined the muted attribute and the corresponding events the way it did years ago. But, mistaken or not, that's how it was defined.

For example: crbug 941740 implements mute on remote tracks reactively based on (lack of) input, violating the WebRTC spec and causing web compat issues.

In this case, Chromium just is applying the model defined in the main spec to remote tracks. The WebRTC spec indicates some cases in which the muted attribute should be set/unset, but AFAICT it does not say anywhere that this overrides the model defined in the original MediaStreamTrack specification. It also does not state a new definition of muted specific for WebRTC tracks and does not even list the muted/unmuted events in its [Event Summary section].

Shouldn't specs that override/redefine concepts inherited from other specs explicitly state it?. Until we make this more explicit in the WebRTC spec, my position is that https://crbug.com/941740 is not a spec-compliance bug in Chromium. If anything, it looks more like a spec bug in the WebRTC spec.

Doing the same on capture tracks seems like a bug, Are you saying it seems like a spec bug or a Chromium bug? It is pretty clear to me that Chromium behavior is spec compliant.

and should be a violation of this spec, Are you saying Chromium behavior is in violation of the spec, or that the spec should be rewritten such that Chromium behavior becomes a violation of the spec?

but is attributed to the aforementioned line in the spec. Not only that line. As I showed, the concept of muted meaning no data from source to track is in many places in the spec, and is the only way muted is defined.

Firefox fires mute as explained in the OP of w3c/mediacapture-extensions#39 (comment) (behind a pref) but never reactively from symptoms.

Maybe Firefox's behavior is the one in violation of the spec?

Proposal:

Replace the confusing sentence with "If a MediaStreamTrack is muted, no live samples are made available to it."

Can you clarify what this sentence means? Is it a description of something that happens when a track is muted? In that case it's not a definition and it's not that different from the original, except in that it is not longer a definition. Basically, it replaces "A is defined to be B" with "A implies B".

Or is it a statement that if the UA detects a condition that should mute the track, then it should make sure the track does not receive any media?

Either way, the change is not enough, since the concept of muted meaning no data from source to track is in many other places of the spec.

Finally, I am opposed to an incompatible redefinition of the meaning of muted because experience shows that this type of change is difficult to deploy in practice and can lead to more interoperability issues.

I am not opposed to a redefinition that provides a path for existing applications to use a newer, more useful definition, without making it impossible for applications to continue relying on the old definition.

guidou commented 10 months ago

@guidou, I understand the concerns. Before diving in those concerns, I understand that there is a desire from Chrome to try moving towards this specific muted definition.

Yes, we are interested in introducing a new definition that can solve the multiple-mute problem (and even the single mute one), but in a way that doesn't break existing applications or that at least provides a path for existing applications to be easily updated to continue working.

About the concerns, in this particular case, the change is about stopping to fire mute events in odd cases. How do you expect it to break existing websites? I would think that some UI might not be updated with the capture-does-not-work-properly, which is not great but not too bad either. And these websites would anyway need to be updated.

In our experience, applications that break are the ones that are hard to think about in advance. We normally find out after rolling out the change. For example, when we implemented the requirement to wait for focus in getUserMedia() we thought nothing would break, and shortly after we started rolling out the change we received reports from some kiosk-like environments that broke because focus was impossible to obtain for those applications. We had to roll back the change.

As of a new attribute, would it mean new event listeners? If so, this has a very high toll, to all browsers and all websites, this seems very complex. Depends on how we define the new attribute. If we go with the muteReason proposal or a similar one, we don't need new event listeners. Applications might need to be updated to look at the mute reason to decide how to proceed, but they would have a path to migrate to the new API without causing permanent breakage.

Given audio/video stats API will allow to simulate these odd cases mute events, would it not be possible to advertise the use of JS polyfills for applications that would like to keep receiving these events? That way, shipping audio/video stats API and muted event migration guidelines could be sufficient.

Maybe that can be a solution. Support the old definition via stats and the new definition with muted. I'm not sure the stats spec in its current form supports this, but it's a valid possibility.

I'd like to avoid introducing a boolean which definition would, from the start, mention that this is for legacy applications and that we plan to obsolete it.

That wouldn't be ideal. It doesn't have to be the case here, though. If we are able to provide a good migration path via stats, that might work. Adding a muteReason or some other API for the new definition would also work.

guidou commented 10 months ago

In general, the value of an "event" is its intent, that something external happened. Therefore, synthesizing events reactively from symptoms seems a mistake.

What can the user agent do on platforms where they get no advanced knowledge that frames will not be forthcoming?

We define APIs based on developer needs, not user agent needs.

I don't think user agents have needs other than the ones of their users (including developers).

If the OS mutes, the user agent owns the problem of detecting that and conveying that as an "event" that happened. E.g. If the user agent has reason to believe lack of frames is instead due to an error, then ending the track may be more appropriate.

To me all this sounds a lot like synthesizing events reactively from symptoms.

The spec already defines muted and ended as separate events for this reason. Agreeing on these definitions is what we've committed to to having browsers interoperate.

Yes. Chromium implements both according to the spec. What we're discussing here is how to change the spec to solve new problems (e.g., multiple mute) in a way that doesn't introduce unsurmountable compatibility problems for existing applications.

the change is about stopping to fire mute events in odd cases. How do you expect it to break existing websites?

An answer to this question would be helpful.

Already answered in a previous message.

alvestrand commented 10 months ago

We define APIs so that developers can satisfy user needs for applications running on a specific UA. The UA has no needs; it exists to satisfy the user - in the case of JS apps, to let the app developers satisfy the users.

The UA and the OS are not friends. And the user has a direct relationship to both.

When an OS-level mute is applied, and can only be rectified using the user's relationship with the OS, the user needs to know that it has to act in relation to the OS.

If the OS offers API to the UA so that the UA can let the app developer satisfy the user's need (in this case: to unmute), the user's needs will be simpler to satisfy.

The difference between muted and ended in our specs is that one is reversible, the other isn't. So anything that is not based on a clear signal that the source is gone and won't come back should be "muted", not "ended". "Reason to believe" sounds like "probable cause", not "clear signal".

youennf commented 10 months ago

If we are able to provide a good migration path via stats, that might work.

video deliveredframes can be used with a timer-based approach to shim existing Chromium muting events for video tracks. Alternatively, shipped rvfc can already be used to detect that frames are not flowing. This probably makes video the easier one to migrate first.

audio deliveredframes can be used for microphone tracks, AudioWorklet might most probably expose 0 in case of missing audio frames.

This approach does not require to create new APIs and allows web applications to fine tune their own detection heuristics.

@guidou, do you think this migration path would work?

alvestrand commented 10 months ago

That's not a migration path, that's a redefinition. It proves that there's nothing preventing other browsers from emulating Chrome's behavior, even if you want to do it in a shim. What possible advantage would there be to Chrome in departing from the existing behavior, which is consistent with the current definition?

jan-ivar commented 10 months ago

For example: crbug 941740 implements mute on remote tracks reactively based on (lack of) input, violating the WebRTC spec and causing web compat issues.

In this case, Chromium just is applying the model defined in the main spec to remote tracks. The WebRTC spec indicates some cases in which the muted attribute should be set/unset, but AFAICT it does not say anywhere that this overrides the model defined in the original MediaStreamTrack specification. It also does not state a new definition of muted specific for WebRTC tracks and does not even list the muted/unmuted events in its [Event Summary section].

This seems wrong. I've filed https://github.com/w3c/webrtc-pc/issues/2915 on this. Let's discuss that there.

I think I see now how we came to have this vague language. MediaCapture-main is trying to establish both a model for all sources, while simultaneously specifying camera and microphone sources explicitly. I think it needs to do a better job separating when it's doing one or the other.

At its core, I think most people consider muting to be a conscious action based on intent. A reason, not a reaction.

eladalon1983 commented 10 months ago

At its core, I think most people consider muting to be a conscious action based on intent. A reason, not a reaction.

Correct me if I am wrong, but at the time that mute was specified, I believe no user agent allowed users to mute the mic/camera, nor did any OS. What conscious action were Web apps intended to discover? By whom? How was this actionable to such Web apps?

jan-ivar commented 10 months ago

It's in the OP: "There can be several reasons for a MediaStreamTrack to be muted: the user pushing a physical mute button on the microphone, the user closing a laptop lid with an embedded camera, the user toggling a control in the operating system, the user clicking a mute button in the User Agent chrome, the User Agent (on behalf of the user) mutes, etc."

The "etc." refers to other "reasons" ... "the User Agent initiates such a change", including "access may get stolen ... in case of an incoming phone call on mobile OS".

I dunno when Safari implemented its pause, but I think it was fairly early? But I don't understand why it matters since it's common and desirable for specs to exist before implementations. Specs define implementations.

When I said "most people" I meant outside of WebRTC. Muting is a verb, a function.

jan-ivar commented 10 months ago

An answer to this question would be helpful.

Already answered in a previous message.

Could you link to it please? This issue is getting long. Please give an example of an application relying on Chrome's behavior and what action it takes. E.g. is it showing the user a message that "things are broken and no-one can hear you, please wait, maybe"?

guidou commented 10 months ago

An answer to this question would be helpful.

Already answered in a previous message.

Could you link to it please? This issue is getting long. Please give an example of an application relying on Chrome's behavior and what action it takes. E.g. is it showing the user a message that "things are broken and no-one can hear you, please wait, maybe"?

The answer is that, in our experience, applications that break with this type of change are the ones that are hard to think about in advance. We normally find out after rolling out the change. For example, when we implemented the requirement to wait for focus in getUserMedia() we thought nothing would break, and shortly after we started rolling out the change we received reports from some kiosk-like environments that broke because focus was impossible to obtain for those applications.

IMO, the bar for changing a definition that has been in place for years both in spec and implementations should be very high, even if the proposed change is obviously better.

eladalon1983 commented 10 months ago

IMO, the bar for changing a definition that has been in place for years both in spec and implementations should be very high, even if the proposed change is obviously better.

I agree. And even if we could come to an agreement - it does not appear to come quick nor easy. Now, @jan-ivar has recently posted something I wholeheartedly agree with:

Web developers should not suffer while vendors reach agreement.

In the spirit of these wise words, I propose we now proceed with one of the backwards-compatible proposals currently under discussions, such as MuteReason or MediaSession. (Full disclosure - I have a strong preference for the former.)

youennf commented 10 months ago

even if the proposed change is obviously better.

It seems we all agree this definition would be better. It would make sense to work towards getting all implementations aligned on that definition.

A path forward has been described, via a shim of current Chrome behaviour. This seems a practical approach to me. If not, I'd like to understand why.

such as MuteReason

This would solidify a model of muted being open ended and loosely defined. A dedicated event based API for each cause where mute might be useful would lead to better interop/convenience to web developers.

MediaSession

We need to make MediaSession and MediaStreamTrack consistent, let's do that whatever we decide here.

eladalon1983 commented 10 months ago

This would solidify a model of muted being open ended and loosely defined.

Even with the proposal here "mute" would still cover both OS-based and UA-based muting. Letting the Web app know which it is does not make it open ended or loosely defined. Carving out an "unspecified" for hardware, issues, or anything we might not be thinking of, does not solidify the model; later migration would be equally challenging then as it is now.

A dedicated event based API for each cause where mute might be useful would lead to better interop/convenience to web developers.

I am not opposed to dedicated events, but they seem to be less elegant a solution, given the possibility of multiple concurrent mutes. Conversely, a single mute state with multiple reasons, allows observing the transition from empty set to non-empty set, which is great for apps that only care about that.

alvestrand commented 10 months ago

It seems to me that even with the greatest selection of mute reasons imaginable, there is likely to be the case of "this source is producing silence and I don't know why". I think that's a reasonable description of the cases where Chrome currently mutes and other browsers have not chosen to mute.

Note: I'm unclear about whether Chrome fires mute events on "no signal" in audio. If we do, I think the signal Chrome is reacting to on audio is digital silence (all zeroes), which is different from "no speech detected" - there's always some noise in real audio.

youennf commented 10 months ago

"this source is producing silence and I don't know why"

It is hard to make progress without precisely knowing how/when Chrome is firing mute events on capture tracks. I understand that Chrome's intent is to currently use mute to notify web applications that a capture track is potentially malfunctioning. Is that correct?

That seems valuable information to provide to the web page. AIUI, this is one of MediaStreamTrack stats goal, though a dedicated API might make web developers life easier.

For video, MediaStreamTrack stats is hopefully sufficient to detect these malfunctioning cases. For audio, it is unclear whether MediaStreamTrack stats is enough, maybe this should get fixed.

eladalon1983 commented 10 months ago

For video, MediaStreamTrack stats is hopefully sufficient to detect these malfunctioning cases. For audio, it is unclear whether MediaStreamTrack stats is enough, maybe this should get fixed.

When the mute event is fired and the app observes it and turns to handle it, what stats are available to it that would definitively, non-heuristically inform it that the track is muted due to an upstream entity such as the OS or UA?

jan-ivar commented 10 months ago

For video, MediaStreamTrack stats is hopefully sufficient to detect these malfunctioning cases. For audio, it is unclear whether MediaStreamTrack stats is enough, maybe this should get fixed.

When the mute event is fired and the app observes it and turns to handle it, what stats are available to it that would definitively, non-heuristically inform it that the track is muted due to an upstream entity such as the OS or UA?

What is "an upstream entity such as the OS or UA" distinct from, when all muting is "UA" by definition? This seems to be the definition problem we're having.

Turning the question around:

When the mute event is fired in Chrome and the app observes it and turns to handle it, what stats are available to it that would inform it that the track is malfunctioning?

In other browsers, apps could detect this (e.g. using stats once implemented):

In Chromium, apps cannot, because Chromium circularly masks the symptom, making malfunction indistinguishable from "OS or UA" mute.

This problem seems unique to Chromium, as does the need for a new mute-reason API to resolve it.

youennf commented 10 months ago

To make progress, I think we should leave the UA vs. OS muting discussion out of this particular issue. This can be resolved orthogonally to this discussion.

The proposal is something like:

  1. UAs refrain from firing mute events in malfunction cases.
  2. We design a shim based on MediaStreamTrack stats that emulates malfunctioning mute events.
  3. If MediaStreamTrack stats is not sufficient for 2, we augment the API surface (in MediaStreamTrack stats or elsewhere, new event e.g.).
  4. If the track is UA-muted, there is no stats, but there is no need to know whether malfunctioning or not.

Other than requiring changes in UAs, I do not see any drawback. Am I missing something?

alvestrand commented 10 months ago

To make progress, I think we should leave the UA vs. OS muting discussion out of this particular issue. This can be resolved orthogonally to this discussion.

The proposal is something like:

  1. UAs refrain from firing mute events in malfunction cases.

I don't see the justification for this. It draws a distinction between "malfunction" and "non-malfunction" that seems unwarranted and unenforceable (if an user unplugs the camera, it's a mute event; if the cat bites off the camera cable, it's a malfunction????)

  1. We design a shim based on MediaStreamTrack stats that emulates malfunctioning mute events.

Since 1 is unjustified, 2 is unreasonable. Also, shims don't belong in the spec. If Firefox or Safari wish to emulate Chrome's behavior, they're free to incorporate a shim of that nature, but I don't see a point in changing Chrome's behavior.

  1. If MediaStreamTrack stats is not sufficient for 2, we augment the API surface (in MediaStreamTrack stats or elsewhere, new event e.g.).
  2. If the track is UA-muted, there is no stats, but there is no need to know whether malfunctioning or not.

Other than requiring changes in UAs, I do not see any drawback. Am I missing something?

Since I don't see the point of the change, I don't see any advantage in making it.

henbos commented 10 months ago

Muted is outside the control of web applications, but can be observed [... reasons why mute can happen]. The User Agent SHOULD provide this information to the web app through muted and its associated events.

Whenever the User Agent initiated such a change, [...]

When the referenced text says the UA "initiates such a change", I believe it is referring to the steps to mute the MediaStreamTrack JS object which only the UA can modify, i.e. the steps to make the muting visible to the web app. Do read the previous sentence about UA should expose this information to the app. Also read all the examples, they're full of things that happened that was not "initiated by the UA" (laptop lid closing, incoming phone call, etc). The only thing initiated by the UA is firing the event, it is reactive, not proactive.

Replace the confusing sentence with "If a MediaStreamTrack is muted, no live samples are made available to it."

This does not make it less confusing. It begs the question: why is it muted? Even under this definition, my reading is still that the UA should detect mute on a higher layer - including reasons of malfunction, the "etc" is really a catch-all - and then initiate the exposure of the mute event. My understanding is Chromium is spec-compliant both with and without this sentence changed.

In other words, today mute means "I'm not getting any frames despite the track being enabled". This makes sense to know whether or not you care about the reason. And because we haven't exposed the reason yet, people haven't been allowed to care about why yet. So from a web developer POV, the use case this solves is still valid and it is backwards compatible not to change it.

If we add the reason, then apps that do care about why have enough information to make the distinction, solving both the use case of caring and the use case of not caring, without causing backwards compat issues.

Finally let's ask yourselves, what value does it bring to developers to pretend a malfunctioning track is not mute?

Screenshot 2023-12-21 at 10 23 11
youennf commented 10 months ago

if an user unplugs the camera, it's a mute event

No, it should be an ended event, the OS knows the camera is gone and most probably surfaces an error to the UA.

if the cat bites off the camera cable, it's a malfunction?

The OS API will tell whether the camera capture is failing or device disappeared. If the OS is not surfacing anything but frames are not coming as expected, it would be malfunction. expected covers Chrome's detection heuristics.

today mute means "I'm not getting any frames despite the track being enabled"

Mal functioning though is not about not getting any frames, as can be illustrated with BT microphones where drops may happen frequently while still getting sometime some audio.

I think these two signals would best be exposed independently. For instance, chances are high that the first video frame is missed when being unmuted after being muted for mal function.

I wonder whether adding a malFunctioning boolean, maybe with corresponding events might be a way forward for https://github.com/w3c/mediacapture-extensions/issues/39, plus being more explicit about what muted means for capture tracks.

alvestrand commented 10 months ago

Adding a mutedAndWeKnowWhy event (with a "reason" parameter) and leaving the current "muted" as-is would definitely be a reasonable way forward.

henbos commented 10 months ago

Using track.stats to detect frames being dropped every now and then makes sense to me, what I meant by malfunctioning in this context is that we're not getting any frames at all so from the app's point of view it is muted (if we get frames every now and then we are not muted, we just have a high drop ratio)

youennf commented 10 months ago

Using track.stats to detect frames being dropped every now and then makes sense to me, what I meant by malfunctioning in this context is that we're not getting any frames at all

Using track.stats would work well to detect both cases I think. It could be used to predict that Chrome is about to mute the track, or am I missing something?

so from the app's point of view it is muted

I understand the usefulness of conveying the mal functioning information to the web app (though it is unclear whether there is agreement on what malfunctioning actually means). I do not see any benefit of conveying this information through the muted boolean compared to conveying this information in a dedicated boolean (other than the fact that this is what Chrome is doing, which is important in itself).

eladalon1983 commented 10 months ago

[@jan-ivar] Turning the question around:

I would prefer a straight answer to my straight question. Exercises in turning questions around do not help us make progress, as this very thread demonstrates.

Recall the question:

[@eladalon1983] When the mute event is fired and the app observes it and turns to handle it, what stats are available to it that would definitively, non-heuristically inform it that the track is muted due to an upstream entity such as the OS or UA?

This question originally referred to Youenn's preceding message. Ironically, it now also refers to Jan-Ivar's subsequent message which sought to brush the very question aside! In this message Jan-Ivar also suggested:

lack of frames in stats + !track.muted = malfunction

The world is asynchronous. It is not possible to definitively correlate the presence/absence of recent frames with the presence/absence of recent mute events. Or if it is possible - please demonstrate how. This was the question. It warrants our attention.

guidou commented 9 months ago

@youennf
For the sake of the argument and to try to move the discussion forward, let's say we find a way to migrate to a definition of muted we all would like.

What would you suggest would be a good way to expose reasonably detailed mute reasons be? Phrased differently, how would requestUnmute() (or some other API) look like?

youennf commented 9 months ago

The current plan is to use Promise<undefined> MediaSession.setMicrophoneActive which could return different errors or error types. This is one place where different mute reasons could be surfaced. Other places might be the togglemicrophone action handler dictionary, or MediaStreamTrack like suggested above.

That said, as it is right now, togglemicrophone would provide a boolean value to muted tracks that we thought might be sufficient in the short term. Hence why I would not concentrate on this topic right now.

Instead, I'd like to first validate that this minimal MediaSession API is good enough:

guidou commented 9 months ago

The current plan is to use Promise<undefined> MediaSession.setMicrophoneActive which could return different errors or error types. This is one place where different mute reasons could be surfaced. Other places might be the togglemicrophone action handler dictionary, or MediaStreamTrack like suggested above.

That said, as it is right now, togglemicrophone would provide a boolean value to muted tracks that we thought might be sufficient in the short term. Hence why I would not concentrate on this topic right now.

Instead, I'd like to first validate that this minimal MediaSession API is good enough:

  • It should allow websites like Google Meet to fix the double mute issue, whether mute can fire only for this reason or for other reasons like in Chrome
  • It should be extensible to support more advanced cases we want to tackle in the future (multiple microphone capture...)

What is the right place to discuss the Media Session proposal? It looks like a potential way forward, assuming it supports having a per-device state.

youennf commented 9 months ago

What is the right place to discuss the Media Session proposal?

See https://github.com/w3c/mediasession/pull/312, https://github.com/w3c/mediasession/issues/307, https://github.com/w3c/mediasession/issues/279 and https://github.com/w3c/mediasession/issues/278.

There is a plan to add support for screen share (https://github.com/w3c/mediasession/issues/306).

assuming it supports having a per-device state.

When we discussed this particular topic, one idea was to add a member to MediaSessionActionDetails, like a deviceId. But it was unclear whether it was useful enough in the short term to work on it. Filing an issue in MediaSession repo might be a good idea to keep track of this.

Similarly, we might want to add a state to MediaSessionActionDetails (to know whether action is about muting or unmuting). I'll probably work on this once the basic PRs are all landed.

alvestrand commented 9 months ago

Note that the w3c/mediasession API does not admit of the existence of multiple microphones. So for all applications that need to distinguish between muting of multiple microphones, it is not a suitable API.

So "the current plan" should be phrased differently - there is no WG consensus for any particular plan. "My proposal" would be more correct.

guidou commented 9 months ago

What is the right place to discuss the Media Session proposal?

See w3c/mediasession#312, w3c/mediasession#307, w3c/mediasession#279 and w3c/mediasession#278.

When we discussed this particular topic, one idea was to add a member to MediaSessionActionDetails, like a deviceId. But it was unclear whether it was useful enough in the short term to work on it. Filing an issue in MediaSession repo might be a good idea to keep track of this.

Similarly, we might want to add a state to MediaSessionActionDetails (to know whether action is about muting or unmuting). I'll probably work on this once the basic PRs are all landed.

Thanks. I left a couple of comments in some of the issues.

I think a solution based on MediaStreamTrack directly would be more suitable for the VC use cases, since applications already have access to the tracks they are playing. Using media session (provided it is augmented to properly support the use case) would require getting the device ID from the MediaStreamTrack, then separately looking up and/or maintaining the right state/objects and filtering events based on the device ID and associating all that to the MediaStreamTracks.

dontcallmedom-bot commented 2 months ago

This issue was discussed in WebRTC August 27 2024 meeting – 27 August 2024 (Moving Forward with Mute)