w3c / mediacapture-extensions

Extensions to Media Capture and Streams by the WebRTC Working Group
https://w3c.github.io/mediacapture-extensions/
Other
19 stars 14 forks source link

MediaStreamTrack.getFrameStats() #77

Closed henbos closed 1 year ago

henbos commented 1 year ago

Fixes #76.


Preview | Diff

jan-ivar commented 1 year ago

This should be converted to draft I think, since this doesn't have any agreement yet.

youennf commented 1 year ago

I think this deserves some WebRTC WG meeting time.

henbos commented 1 year ago

I think this deserves some WebRTC WG meeting time.

Ack. I'll join the meeting on Tuesday.

henbos commented 1 year ago

TODO based on Virtual Interim: remove framesEmitted from this PR (either skip it or revisit in the future with a compelling use case)

henbos commented 1 year ago

Also I was asked on chat "what if the OS drops the frame?", so maybe framesCaptured should start at entry to the user agent to avoid confusion

alvestrand commented 1 year ago

@youennf has questions about the getStats as a model, and will provide commentary.

youennf commented 1 year ago

It is not clear to me whether we want getStats or a simple promise based getter as a method. Are we expecting more values in getStats?

Also, it seems that we want to know whether the counter is tied to the native frame rate or the potentially decimated frame rate. If the former, we would need to know what is the actual expected native frame rate. It is not super clear from the PR.

henbos commented 1 year ago

We could make this Promise<unsigned long long> framesCaptured() instead if we think it is unlikely that we add more counters. My thinking was that getStats() could ensure everything comes from the same snapshot if we do add more metrics, but I don't know if we will.

We already know the expected native frame rate, that's track.getSettings().frameRate (what the camera was configured to do). framesCaptured are the frames that were actually produced (prior to any frame decimation that the user agent may have done).

henbos commented 1 year ago

E.g. I open the camera in 30 fps (settings.frameRate: 30) but due to poor lighting condition I only get 25 fps (framesCaptured would increment by 25 every second).

youennf commented 1 year ago

E.g. I open the camera in 30 fps (settings.frameRate: 30) but due to poor lighting condition I only get 25 fps (framesCaptured would increment by 25 every second).

The native frame rate might be 60 fps though and UA might decimate it to 30 fps to achieve the settings frame rate. UA is not exposing the information that camera is capturing at 60 fps. Which one of these frames (pre or post decimation for instance) are we supposed to count?

Said differently, if we have two tracks that have the same source but different frame rates, would they have the same counter or not? My understanding is that you want the post decimation counter, but this is not clear in this PR.

I don't know if we will.

Agreed, I am not sure either we will actually add more info. Hence why I am tempted to go with a single getter.

Two things to consider: -Maybe for audio, we could expose things around audio drops?

henbos commented 1 year ago

My understanding is that you want the post decimation counter, but this is not clear in this PR.

I was going back and forth on this, but agree that post decimation makes the most sense, so sources with different frame rates have different framesCaptured.

Hence why I am tempted to go with a single getter.

Done. New definition:

                Let <var>framesCaptured</var> be a snapshot of the number of
                non-decimated captured frames, this includes frames that were
                dropped by the user agent for any reason other than decimation
                in order to achieve the configured frame rate settings.

PTAL

vr000m commented 1 year ago

Is there a timestamp reported somewhere?

Since framesCaptured returns a long, I think it is just reporting the latest counter. What happens if something in the pipeline breaks and while the source is still producing frames but it is being dropped for some reason by the encoder, the timestamp will show when that issue started to occur.

Since this is not inheriting from getStats, we should make sure that there is a timestamp, i.e., timestamp of when the underlying object was updated, this may happen each time a framesCaptured is incremented.

henbos commented 1 year ago

Re: @vr000m

I figured you could just use the timestamp that the promise was resolved for this, but technically there could be a delay between the in-parallel steps and the promise resolving.

If we want a timestamp, we'd need to change this back to a stats dictionary to have multiple members.

@youennf do you have opinionos?

vr000m commented 1 year ago

Let's say an issue occurred, unless the endpoint is polling framesCaptured() all the time, at least we wont know when the issue started to manifest.

henbos commented 1 year ago

Friendly ping

youennf commented 1 year ago

It is still a draft, @henbos, can we click 'Ready for review'?

henbos commented 1 year ago

I added an internal slot as suggested by @youennf and changed this back to getStats() as suggested by @jan-ivar. Can the both of you take another look?

jan-ivar commented 1 year ago

We might also want to put the final API in front of the working group before merging.

henbos commented 1 year ago

I updated the PR again. I still have timestamp in the same PR, but I clarified that this is any video track and tried to phrase it more like a requirement. I also found another spec which has more requirement-like phrasing: https://w3c.github.io/media-playback-quality/#videoplaybackquality-interface

Please take another look and tell me if we're getting any closer to merging this

vr000m commented 1 year ago

Is this on the agenda for Jan 2023 interim?

henbos commented 1 year ago

I can create a slide for it

henbos commented 1 year ago

Virtual Interim feedback:

dontcallmedom commented 1 year ago

"Total frames" makes more sense than "dropped".

I like that this also allows to add more fine-grained categories over time if we find that more diagnostics would be useful

henbos commented 1 year ago

I added a note about the total frames and calculating lost frames from total frames. Since this is only a requirement and not an algorithm, and frames could get lost anywhere including outside the renderer process so I noted that this is best effort

henbos commented 1 year ago

I know I said that when the track is muted then the frame counters should stop incrementing, but at the time I was thinking about camera frames. @handellm reminded me that a muted track in fact produces zero content frames which also get delivered, so I updated the PR to reflect that total frames may in fact increment even while muted

youennf commented 1 year ago

@handellm reminded me that a muted track in fact produces zero content frames which also get delivered

In WebKit at least, a muted track produces no frame at all. It would be nice if we could be consistent across UAs. RTCPeerConnection will send encoded zero content frames periodically if tracks are muted but this is a different mechanism.

Can you validate Chrome behaviour here? Say with VideoTrackProcessor or rvfc maybe?

guidou commented 1 year ago

@handellm reminded me that a muted track in fact produces zero content frames which also get delivered

In WebKit at least, a muted track produces no frame at all. It would be nice if we could be consistent across UAs. RTCPeerConnection will send encoded zero content frames periodically if tracks are muted but this is a different mechanism.

Can you validate Chrome behaviour here? Say with VideoTrackProcessor or rvfc maybe?

Muted tracks in Chrome don't produce frames either.

henbos commented 1 year ago

Oh... so when the spec says "A muted or disabled MediaStreamTrack renders either silence (audio), black frames (video), or a zero-information-content equivalent" this refers to what the sink is doing, not what the track's source is doing?

henbos commented 1 year ago

I updated the PR to instead say "If no frames are flowing, such as if the track is muted or disabled, then total frames does not increment."

henbos commented 1 year ago

I updated the PR to instead say "If no frames are flowing, such as if the track is muted or disabled, then total frames does not increment."

If "frames flowing" is an implementation detail, I could rephrase this to simply "If the track is muted or disabled, then total frames does not increment."

vr000m commented 1 year ago

Depending on which type of mute is implemented, source will send encrypted 0 content frames (silent samples) -- I think it's done for security reasons within the peerconnection.

However, in this case, if you mute the source, then the source isn't going to produce any frames.

henbos commented 1 year ago

I think the current phrasing is good. Can we merge this PR soon?

alvestrand commented 1 year ago

Varun, are you OK with merging? What a peerconnection does with the frames is a webrtc matter, not a mediacapture matter.

vr000m commented 1 year ago

Agree on the pc being a matter for another spec. I've followed along the conversation in this PR but haven't looked at the latest changes.

Give me a day to review the latest changes.

alvestrand commented 1 year ago

So if the question gets answered, Youenn can give an approval, and then we can merge it. There seems to be no other issues.

henbos commented 1 year ago

The comments have been addressed, PTAL

dontcallmedom-bot commented 1 year ago

This issue was discussed in WebRTC January 2023 meeting – 17 January 2023 (MediaStreamTrack Frame Rates 🎞︎)