Closed handellm closed 1 year ago
I think this is OK. Was wondering whether the interval is reset when the encoder decides on its own to insert a keyframe, but concluded after reading again that the language is sufficiently nonrestrictive that it does not matter.
Would be good to have 2 slides on this at the February interim, just to make sure it's been on the agenda.
To my understanding most encoders configure keyframe intervals in number of frames, not based on time. What's the intended way for an implementation to convert the time interval to a frame interval given that video MediaStreamTracks by definition have no fixed frame rate?
Would it be a better fit to define the keyframe interval as a frame interval? The application might have a better understanding of the rate at which it will feed frames to the MediaRecorder. Is there specific use case where a time interval is a better fit?
One of the main use cases for recurring keyframes in a non-streamed setting that I can think of is to be able to seek to any given point in the file by jumping to the keyframe prior to that point and decoding your way there. The speed of this process is determined by the number of frames between the keyframe and the point one is seeking to. So to be able to get consistent seek times with a variable frame rate, a key frame interval expressed as a frame interval seems favorable.
A min and a max interval seems like an even better fit for most encoders, but I guess MediaRecorder could be free to infer these.
To my understanding most encoders configure keyframe intervals in number of frames, not based on time.
On macOS/iOS you have the choice. In many case, there is the possibility to mark a specific frame to be encoded as key frame (see https://www.w3.org/TR/webcodecs/#dictdef-videoencoderencodeoptions for instance).
For what is worth, WebKit is doing the following: a key frame every 2 seconds or 60 frames, whichever comes first. I wonder whether we could provide suggestions to implementors for the default value, as a non normative statement for instance.
Getting a better sense of the use case to go with either time or/and frame count would be good.
Is there an example of an encoder where per-frame control of keyframes isn't possible?
The use case is to let the MediaRecorder support basic control over seekability without requiring post-processing and give the user a basic knob to control the tradeoff between muxed size, quality & seek accuracy. Another useful consequence is also that parts of the stream can be excised with specified temporal accuracy without requiring transcoding.
Is there specific use case where a time interval is a better fit?
My bet is the user thinks more about seconds than about frame intervals. I struggle to see use cases where a count interval will be better. While it's possible to influence min and max track framerates with getDisplayMedia
/getUserMedia
and applyConstraints
, my understanding is the default source behavior is unspecified. Which means a frame interval gives an undefined temporal accuracy in such a case, or when it's impossible to know the configured rates.
A consequence of a temporal interval is that muxed streams with only keyframes may result. If the output bitrate is constrained, the muxed stream may be of lower quality in stretches with low framerate. But MediaRecorder is supposed to be a simple API. Where more refined control than "increased bitrate config" is needed, the user can always go to WebCodecs.
With the same rationale, I think providing both count & temporal control + logic spec might not be needed for this API.
I wonder whether we could provide suggestions to implementors for the default value, as a non normative statement for instance.
Do you have a suggestion in mind?
Do you have a suggestion in mind?
I would look at whether Chrome and Firefox have defaults (or will have after this change), similarly to WebKit ones. Maybe we can somehow harmonise values.
Yes, that makes sense. The current story for Chrome is that there are generally no keyframes forced midstream, except if the mimeType results in getting an accelerated codec, then you get a count based interval of 100 frames (which plays badly with lower framerate input streams). Happy to change this closer to FF/Safari.
I prefer interval over number of frames because I think key frames have a lot to do with seekability, and if this was a count instead of an interval I feel like you'd end up having to calculate it anyway. That said this isn't really my area, so feel free to continue the discussion. Either way I'm happy, +1
Is there an example of an encoder where per-frame control of keyframes isn't possible?
Is there a risk with the per-frame API for keyframes, that the encoded stream has more visual artifacts as a result of the encoder having a harder time predicting how to allocate the target bitrate, than if it were to configure the encoder with a per-frame interval?
I have seen behavior in libvpx suggesting this, but I'm not sure how much of it was due to using VPX_EFLAG_FORCE_KF and how much was due to a libvpx bug.
IIRC Firefox sets a keyframe interval of 10 seconds, for historical reasons [1], and occasionally recalculates the frame-interval based on recent incoming framerate, so it can configure the encoder with this.
[1] our ebml writer only creates a new cluster on a keyframe, and the max timestamp within a cluster is 32 seconds.
I prefer interval over number of frames because I think key frames have a lot to do with seekability, and if this was a count instead of an interval I feel like you'd end up having to calculate it anyway.
I'm not sure what you mean by seekability, but consider these cases:
This is the tradeoff I see. Perhaps there's something to be said about predictability and bitrate allocation (and if that is the case, it would be in favor of frame intervals?) but I don't really know.
Is there a risk with the per-frame API for keyframes, that the encoded stream has more visual artifacts as a result of the encoder having a harder time predicting how to allocate the target bitrate, than if it were to configure the encoder with a per-frame interval?
Is there specific use case where a time interval is a better fit?
It doesn't have anything to do with per-frame API, but with a fixed bitrate budget, quality per frame becomes lower as the ratio of num keyframe/num deltaframes gets higher. When you think about it this is more likely to happen with time-based control with for example low-framerate slideware. With ultimate control over keyframe generation, this seems like a situation a count-based control makes sense. (Note, ultimate control isn't entirely possible. Encoders anyway emit keyframes when diffs to reference frames becomes too large, so count-based will not be predictable in this situation.)
This is the tradeoff I see.
Worth mentioning here's it's also the case that with count-based intervals you can't excise with specified accuracy. Say we've recorded a presentation with occasional slide changes. You'd like to save 0.58 minutes - 1.30 minutes in a webm somewhere else. If the framerate was low in the 0.58 section, with count-based intervals chances are you could have to save an area tens of seconds before that point.
How about specifying both? Say MR configuration is 2 parameters X (min time between keyframes) and Y (min framecount between keyframes):
void OnFrame(const VideoFrame& frame) {
++currentFrameIndex;
now = getCurrentTime();
forceKeyFrame = now - lastKeyFrameTimestamp >= X &&
currentFrameIndex - lastKeyFrameIndex >= Y;
if (forceKeyFrame) {
lastKeyFrameTimestamp = now;
lastKeyFrameIndex = currentFrameIndex;
}
Encode(frame, forceKeyFrame);
}
This supports three cases:
Looks good to me. From a user perspective, I'd also say that time-based control would make more sense. Can't speak for what's better with encoders though.
Updated to include support for both time and count base interval. PTAL!
Editors meeting: @youennf and @jan-ivar had a few comments (editorial and non-editorial) that need addressing.
Two comments:
Youenn, for some reason I can't reply to your comment.
It would be good to rephrase the use of the new slots like the other slots in terms of "Constrain the configuration of |recorder| to encode"....
Tried to do this in 0df3dba, wdyt?
In macOS and iOS, when setting both values, I think the min is used (I'll check) but the spec is asking for the max.
The rationale for this spec text is what I discussed with Harald earlier:
"The rationale is to support variable-rate sources where you get low per-frame quality (QP) in stretches where FPS is low with a pure time-based control, since the ratio of keyframes (in which quality is lower) to delta frames (in which quality is enhanced) would be higher in the low FPS stretches. See also earlier comment in https://github.com/w3c/mediacapture-record/pull/216#issuecomment-1415989756"
I realize Apple may not currently be using those rules, but on the other hand the spec includes Apple's implementation by "If neither of the slots are defined, the User Agent may emit key frames as it deems fit.". Wdyt?
Friendly ping!
Thanks for comments! New iteration uploaded, PTAL again.
Youenn, any thoughts from you?
Thanks for catching the seconds! Approved with milliseconds to follow § 8.3. Use milliseconds for time measurement and for consistency with existing timeslice.
This PR adds support for video key frame configurability in the MediaRecorder API.
Preview | Diff