Open youennf opened 2 years ago
I think it would be better to first discuss whether we need this (this information can be gathered from other sources, or region producer could instruct the capturer that its region is no longer useful). If we need this mechanism, maybe it deserves its own mechanism.
I'd be happy to drop everything mute-related from the spec, and reopen this topic if+when it's relevant in the future. I can envision several possible mechanisms to address this, but I don't think it's a crucial part to an MVP. Wdyt @jan-ivar and @youennf?
I'd be happy to drop everything mute-related from the spec
Sounds good.
and reopen this topic if+when it's relevant in the future.
It seems worth opening a new issue to keep track of this area (crop target is no longer valid, say HTMLElement destroyed say; crop target is valid but will trigger zero pixel frames), should UA take default actions? Should web pages have an easy way to detect these cases?...
I'd be happy to drop everything mute-related from the spec
Sounds good.
To clarify, I'll be removing references to mute
, but otherwise the behavior will remain the same - no new frames will be delivered if the application specified a crop and either (i) it's not possible to crop to that element, e.g. if it's gone, or (ii) cropping to that element results in zero pixel frames, e.g. if the crop-target has been scrolled out of the viewport.
It seems worth opening a new issue to keep track of this area
I think we can keep using this issue and continue the discussion here? I've applied the label Improvement
to make it clear that we've agreed this is a non-blocking issue.
Continuing the discussion - there are multiple cases where the application either cannot recognize what's going on, or it's not ergonomic for it to do so, or it would introduce a delay due to cross-origin communication to achieve. That's why I used the mute
signal. If mute
is problematic because it's hard to tell apart from existing uses, we can introduce a similar signal (maybe interrupted
) if we find out developers are asking for it. IMHO it's fine to start by using the same signal for all interruption reasons.
Another option, and one which I like better, is to keep using muted
, onmute
and onunmute
, but to add a mute_causes
vector:
enum MuteCause {
"user",
"target_lost",
"zero_pixel_frame"
};
partial interface MediaStreamTrack {
readonly sequence<MuteCause> mute_causes;
}
@jan-ivar, could you please take a look at the current language around the mute events? As per agreement with @youennf, I'll remove mentions of muting before we move to FPWD, but I want to give you some time to have a look and form an opinion about what we'd want to do afterwards.
Mute has traditionally been "this track is not producing data, this was intentional, and we're not telling you why". https://w3c.github.io/mediacapture-main/getusermedia.html#track-muted
Do we have an operational need for the application to know why the track is muted, that can't be solved by the capturer inspecting the situation in other ways, for instance by inspecting the HTMLElement it did produceCropTarget() on?
If adding MuteCause, I'd add this to CropTarget, not to MediaStreamTrack.
...that can't be solved by the capturer inspecting the situation in other ways...
If the capturing frame is cross-origin from the frame hosting the crop-target, this inspection requires asynchronous messaging, and there's a cost associated. Applications that need to react[*] won't be able to do so in a timely fashion. (Also hard to handle robustly, e.g. when the user repeatedly scrolls the crop-target in and out of view.)
[*] That said, the applications I currently have in mind, would probably not allow crop-targets to be scrolled away. I cannot presently think of a realistic use-case where an application would really need this signal. That's why I'm happy removing the discussion of muting from the spec, and continue this discussion as a lower-priority issue.
Mute has traditionally been "this track is not producing data, this was intentional, and we're not telling you why".
Non-rhetorical questions: Is the "not telling you why" part important? Is it important to hide the fact the user pressed "mute" in the browser's UX? And if so, do we have enough possible causes in the mute bin that "user probably muted" could not be inferred? Was it a conscious decision to hide the cause, and if so, is it worth "relitigating" it?
Should we close this issue now that mute language has been removed from the spec?
@youennf, it's your issue. Could you please close it?
I have an application using a cropped track, but I want to be able to hide the video element without causing the feed to be "muted" so that the stream still transmits pixels. To be more clear, the crop target is visible and the video it is transmitting to is not visible. Can this be considered?
Currently, https://www.w3.org/TR/mediacapture-region/#empty-crop-target implies that a hidden video element will be an empty crop target and therefore will be muted. To prove this to myself, in my application every time I hide the video element I use with css (i.e. visibility: none or position off the page), the onmute event of the videoTrack from the stream (from navigator.mediaDevices.getDisplayMedia) is fired.
I am wondering if showing the cropped video is strictly necessary? Would giving the option to not mute the stream to developers reduce load for users by not requiring the client to have a video element (i.e. not requiring developers to have video.srcObject = stream;)?
Currently it seems it's not possible to have a crop target without setting the srcObject
of a video element and having that video be within user view because the videotrack will automatically be muted when it's not attached to a visible video element.
If not just for my specific use case, I believe it may be useful to allow sending the stream to an empty crop target if a developer wanted to use the video track/stream only to send to a web worker that processes it.
Example with navigator api, without crop target, without video tag but still streaming:
The purpose of the following investigation is to contrast the handling of crop target's mute with other streaming sources.
https://github.com/w3c/webcodecs/blob/main/samples/capture-to-file/encode-worker.js
^ this example uses workers to parse video from new MediaStreamTrackProcessor(videoTrack)
, but also includes video.srcObject = stream;
. It also uses the navigator api
to get client video to this processor. However, none of the apis being used in this example restrict developers to having the video element on the window be visible.
Since this example is using the navigator api to send to a stream I would expect this to also have a "muted" mechanism for invisible video tags, but it does not.
Further checking this example at the live app: https://webcodecs-samples.netlify.app/capture-to-file/capture-to-file.html
if you were to inspect element and change the video tag to have the css
visibility: none
or change the positioning
position: absolute; left: 1000000px;
you still get the recording as intended and there is no "mute" functionality even though the video is not present.
A developer can even not include the video element at all (go to this website, inspect element, delete the html video tag, click record and verify the recording was still taken)! Even without the video tag, processing of the video was able to happen.
TLDR: Another video stream reference I've seen with the navigator API does not require the video tag to be present so does not seem to be using "muted" in a consistent way. Additionally, some apps may want to do video processing with workers and not necessarily want to show the unprocessed video back to the user, so I believe it may possibly be useful to allow streaming to an empty crop target (by not muting the track if the video is not in frame). I apologize if this is the incorrect place to be mentioning all of this, thanks for reading!
Hi @thomas-gg. I am having some trouble following your message. It seems to assume that the cropped video must be played back through a video tag, which is not true. Cropping also works with simply saving the video to disk, transmitting it remotely, etc. The current issue deals with the crop target being empty.
But I may have misunderstood you. If so, please try explaining the issue more briefly and simply.
Hi @eladalon1983, I am very sorry for any confusion.
I provided a video to show my issue. In the video below I am using the code from https://developer.chrome.com/docs/web-platform/region-capture/#demo I tried to provide as short as a video as possible and this is replicable by copying the demo in the link above without changing any code (using live server).
Doing the same steps as in the video with this demo (the demo within this github repository) does show you are correct and the video tag is not necessary when the crop target's video is not in an iframe (the State: active is present and not State: muted!). I think the major difference between this demo and my own project is that I am sending crop target to an iframe's video tag.
*edit I sincerely apologize for the confusion, I did not investigate the automatic muting without using an iframe. I actually believe I can change my project around to not use an iframe and I wanted to thank you for your time! I also understand your distinction about the crop target being empty, my confusion was with the streamed to object being empty, so I apologize for putting my confusion here. Thank you
Thank you for the clarifications. Please note that this GitHub repo is appropriate for spec issues. If you find any Chrome-specific issues (i.e. issues with our implementation of the spec), then I encourage you to file a bug here and assign Blink>GetDisplayMedia>RegionCapture
under Components
. That should be enough to ensure I get CCed (eladalon@chromium.org).
The spec is not really clear about what muting a track means. It probably means doing https://w3c.github.io/mediacapture-main/#set-track-muted.
Using muted is potentially not a great idea since the application might not know whether the track is muted because of cropping or because of some other reasons. For instance, Safari does allow a user to mute/unmute.
Also, muted is usually tied to the source: if source is muted, all related tracks are muted. In this case, muted becomes a property of the track. It is assumed that muted is "outside the control of web applications" while in that case, applications could potentially mute/unmute it.
I think it would be better to first discuss whether we need this (this information can be gathered from other sources, or region producer could instruct the capturer that its region is no longer useful). If we need this mechanism, maybe it deserves its own mechanism.