w3c / imsc-hrm

IMSC Hypothetical Render Model
https://w3c.github.io/imsc-hrm/spec/imsc-hrm.html
Other
1 stars 6 forks source link

Allow short empty ISDs between non-empty ISDs #50

Closed palemieux closed 1 year ago

palemieux commented 1 year ago

Closes #49 and #52


Preview | Diff

palemieux commented 1 year ago

@nigelmegitt See revised PR:

palemieux commented 1 year ago

We probably also need to note that the expectation is that empty ISDs are presented but the complexity/presentation time cost of clearing the display is counted only once.

Where my mind is at: painting and presenting empty ISDs involves no complexity since not displaying content (as opposed to clearing a buffer) does not involve work.

nigelmegitt commented 1 year ago

Where my mind is at: painting and presenting empty ISDs involves no complexity since not displaying content (as opposed to clearing a buffer) does not involve work.

I don't think this is consistent with the spec, which does assign some complexity to CLEAR. Setting all the pixels of the output buffer to be empty/transparent clearly does take finite time.

I'd be happy with an adjustment that separates clears from paints and assigns a fixed complexity to clears. Doing that as a general case would obviously be a worst case scenario, since a real implementation could cache a set of rectangles in which content had been painted in the most recently presented ISD and only clear those rectangles. I think worst case is what we want here.

palemieux commented 1 year ago

I don't think this is consistent with the spec, which does assign some complexity to CLEAR. Setting all the pixels of the output buffer to be empty/transparent clearly does take finite time.

The specification considers that the contents of the document are drawn into buffers that are composited onto the related video element. The idea is that clearing a buffer (prior to painting a new ISD) adds complexity, but not displaying a buffer does not add significant complexity.

css-meeting-bot commented 1 year ago

The Timed Text Working Group just discussed IMSC-HRM - Coalesce empty ISDs into non-empty ISDs w3c/imsc-hrm#50, and agreed to the following:

The full IRC log of that discussion <nigel> Topic: IMSC-HRM - Coalesce empty ISDs into non-empty ISDs w3c/imsc-hrm#50
<nigel> github: https://github.com/w3c/imsc-hrm/pull/50
<nigel> Nigel: This PR is primarily to deal with the clears and short gaps between ISDs - is that right?
<nigel> Pierre: We're trying to address the industry practice of 2 frame gaps between successive
<nigel> .. subtitles or captions.
<nigel> .. Today the HRM imposes a very high cost to those short gaps.
<nigel> .. This PR addresses that.
<nigel> .. I think we're down to just one question:
<nigel> .. Should there be any cost to displaying an empty ISD?
<nigel> .. i.e. painting it and presenting it
<nigel> .. I am arguing that because the model assumes that subtitles are painted and presented on
<nigel> .. a graphics plane that is then overlaid on to the related video object, then displaying nothing
<nigel> .. costs nothing because you don't display that pane altogether.
<nigel> Andreas: Makes sense to me.
<nigel> Pierre: Another goal of this PR is to make as little change as possible.
<nigel> .. That's where my mind is, why I'm arguing that there is no cost with presenting and painting empty ISDs.
<nigel> q+
<nigel> Andreas: As I understand there's no algorithm that would calculate a clearing action, right?
<nigel> .. What it means to have an empty ISD that follows an ISD with content, that would then be a clearing action.
<nigel> Pierre: My suggestion is that clearing happens only in the buffers where things are going to be painted.
<nigel> .. Before drawing a new non-empty ISD you have to clear the back buffer, and that has a cost.
<nigel> .. You first clear that, then draw the non-empty ISD, then when the presentation time comes in you
<nigel> .. flip the buffer and display it.
<nigel> .. If you clear it before then there's no cost because you just don't display the buffer.
<nigel> .. Clearly the "clear" is related to the cost of clearing the buffer onto which an ISD is painted.
<nigel> .. What I'm suggesting is that in fact there is no need to paint empty ISDs so there's no cost to not displaying
<nigel> .. the graphics plane, period.
<nigel> q?
<nigel> Nigel: That's an argument I hadn't understood before, so good to know.
<nigel> ack n
<nigel> Pierre: By the way I'm just trying to come up with something that requires as little change as possible
<nigel> .. to the model. I hadn't thought exactly this 2 weeks ago for instance.
<nigel> Nigel: Good to clarify that you're not saying that the cost of CLEAR is zero.
<nigel> Pierre: Yes, the cost of CLEAR is not zero, but there's nothing to clear for empty ISD.
<nigel> Nigel: Then the question is if we always need at least one buffer to be composited, and
<nigel> .. therefore that we need to count the cost of clearing a buffer at least once for each clear.
<nigel> .. Or can we just say "stop compositing" at no cost.
<nigel> .. I think the conservative approach is to say that we assume there is always exactly one buffer
<nigel> .. being composited, and therefore we need to count the cost of preparing an empty buffer if there is
<nigel> .. to be an empty ISD after the current one.
<atai> q+
<nigel> Pierre: Within the model that's not possible with only 2 buffers.
<nigel> .. If you clear the back buffer for the empty ISD and then present it then you have to wait
<nigel> .. for the empty ISD to be presented which defeats the purpose. You would need a third buffer.
<nigel> .. What I have in mind is that the screen is being refreshed by the video.
<nigel> .. There's no cost to not painting on the next refresh. Not drawing an overlay is a no-op.
<nigel> .. When the next frame comes in you draw neither of the two buffers.
<nigel> ack a
<nigel> Andreas: The HRM is a theoretical construct to calculate presentation cost of ISDs.
<nigel> .. That doesn't mean that is how it is implemented in practice.
<nigel> .. You could view it like you do but of course in practice it could be like Nigel says when empty ISDs come in.
<nigel> .. Follow a logic that is not yet implemented in the HRM.
<nigel> Pierre: Sure, exactly, I agree.
<nigel> .. I think what we're trying to do here is to tease out where the main complexity is, how it scales.
<nigel> .. I think I'm arguing, I've convinced myself, in general, you could have a cost to not displaying pixels,
<nigel> .. you could have a third buffer, maybe your implementation is on still images and clearing has a cost,
<nigel> .. but in the case of video the cost of not displaying an ISD is not substantial and can therefore be safely
<nigel> .. ignored.
<nigel> Nigel: Sudden moment of clarity for me that the current pre-PR HRM is what you have to end up with
<nigel> .. if you have a 2 buffer model.
<nigel> .. I think what we're discussing here is that the idea is the implementation flips from compositing
<nigel> .. buffer A to B to A to B etc
<nigel> .. but in my mind I'd imagined that there was a compositing plane C
<nigel> .. and the task is to blit A to C, B to C, A to C etc which would have a very different set of constraints.
<nigel> .. I'm not sure which it is right now.
<nigel> Pierre: The presentation buffer Pn-1 is directly connected to the display in figure 2.
<nigel> .. This was designed with a simple TV architecture in mind, where one or the other is displayed on screen,
<nigel> .. or read directly by the display circuitry.
<nigel> Nigel: I see what you mean about fig 2.
<nigel> .. But again, fig 2 has no switch about whether the presentation buffers are connected to the display or not.
<nigel> Pierre: We could add that to the model, that with empty ISDs nothing is displayed.
<nigel> .. The PR has text that goes beneath that figure, but we could modify the figure to make it clearer.
<nigel> Nigel: Yes. I'm just wondering which is preferable, to insert a switch to the display
<nigel> .. or to posit a 3rd buffer.
<nigel> Andreas: A new buffer would add more complexity to the model, right?
<nigel> Nigel: I don't really think so.
<nigel> Andreas: For me the question is if for some implementations there would be some cost when an
<nigel> .. empty ISD is going through the change, and if this cost is high enough to be taken into account
<nigel> .. for the calculation of complexity.
<nigel> Pierre: Exactly.
<nigel> Andreas: The cost is negligible and not go into the calculation.
<nigel> Nigel: I don't think that introducing the third buffer and counting the cost of a CLEAR would not
<nigel> .. fail anything that passes today, but there are some documents that could potentially fail it,
<nigel> .. and we should catch those.
<nigel> Pierre: Concerned about the editing complexity that would introduce.
<nigel> Nigel: I haven't done the exercise, it seems like it _should_ be simple, but maybe not.
<nigel> Pierre: If you want to try making the edit, go ahead. I'm not sure what it will look like.
<nigel> .. You need to introduce the cost of drawing into the back buffer.
<nigel> Nigel: I would have to make the assumption of a cost-free blit into the third, composition buffer.
<nigel> Pierre: If you want to give it a shot, as we pointed out, adding the cost of a clear for an empty ISD
<nigel> .. will land up with results that are closer to what we have today. It's not going to
<nigel> .. invalidate documents that are valid today. That's not my concern.
<atai> q+
<nigel> .. It's updating the text in a way that does not cause more trouble than it solves.
<nigel> .. I'm happy to modify figures - I have the master files, though they may be checked in.
<nigel> ack at
<nigel> Andreas: I want to echo a bit Pierre's concern more about the complexity of the specification and
<nigel> .. unexpected consequences of adding a new concept.
<nigel> .. The issue is that the whole model is already not so easy to get, from non experts
<nigel> .. or people who really need to implement it.
<nigel> .. Adding a new component just increases the complexity of the model.
<nigel> .. From an understanding point of view.
<nigel> .. When we consider adding it, we should compare the benefit of adding it to the
<nigel> .. benefit we get from getting closer to reality for this particular case.
<nigel> .. We heard that even if we add a component it would not really change the results very much.
<nigel> Nigel: Either proposal is a change to the model.
<nigel> .. I'm concerned about clarity of that change.
<nigel> .. The risk is that it's so subtle that it is not actually noticed properly.
<nigel> .. I'm not sure if the mathematical results would be identical.
<nigel> Pierre: They would not be because you'd count one extra CLEAR per sequence of ISDs.
<nigel> .. Neither proposal would invalidate a set of valid documents, which they don't want to do.
<nigel> s/they don't/we don't
<nigel> Nigel: I'm interested to know what is the edge case between the two ideas where
<nigel> .. one would validate a document and the other would say it is invalid.
<nigel> Pierre: The cost of that extra clear would be included in the cost of painting the following non-empty ISD, right?
<nigel> Nigel: I think so.
<nigel> Pierre: Let me try something.
<nigel> .. Scenario: non-empty ISD, empty ISD, non-empty ISD
<nigel> .. The model limits the complexity by saying painting the second non-empty ISD takes some time
<nigel> .. and the system has some time to do that, and what we're saying now is that the
<nigel> .. time available to paint the second non-empty ISD is the time between the presentation time of
<nigel> .. difficult to say without graphics on screen - [shares screen]
<nigel> .. [discussion of 2 buffer model plus 3 buffer model by looking and pointing at the rendering time figure]
<nigel> Pierre: We could schedule some time for a call next week?
<nigel> Nigel: That's a good shout - I should be able to do that.
<nigel> Pierre: I'd really like to get to the point where we ask for feedback, and
<nigel> .. we need to complete this issue before we do so.
<nigel> Nigel: Yes
<nigel> Pierre: I am available at the same time next week
<nigel> Nigel: Me too
<nigel> SUMMARY: Discussions to continue offline and possibly in additional call
css-meeting-bot commented 1 year ago

The Timed Text Working Group just discussed Coalesce empty ISDs into non-empty ISDs w3c/imsc-hrm#50, and agreed to the following:

The full IRC log of that discussion <nigel> Topic: Coalesce empty ISDs into non-empty ISDs w3c/imsc-hrm#50
<nigel> github: https://github.com/w3c/imsc-hrm/pull/50
<nigel> Nigel: Pierre and I had a design chat about this last week
<nigel> Pierre: Yes, I'd like to share my latest thoughts. I've spent a lot of time on this.
<nigel> .. [shares screen]
<nigel> .. [screen shows updated model with a switch between the presentation buffer and the display, to allow
<nigel> .. clearing to happen by not compositing]
<nigel> .. The path I'm going down as a proposal is that every ISD that comes in, decide if it is empty or not.
<nigel> .. If it is not empty, go through the same process as the current HRM.
<nigel> .. If it is an empty ISD, toggle the output of the current model, to turn off the output of the presentation buffer.
<nigel> .. The idea is that an empty ISD is empty so there is nothing to display.
<nigel> .. What it really means is that the cost of processing empty ISDs is zero because there is nothing to render.
<nigel> .. Separately Nigel and I discussed whether or not there should be additional constraints on the
<nigel> .. duration of empty ISDs, to catch errors seen in IMSC files where very small gaps are introduced
<nigel> .. erroneously.
<nigel> .. Nigel also mentioned that setting minimum time constraints on an ISD might reflect refresh rates etc.
<nigel> .. I'm not so concerned about this because there's already text about the refresh rate possibly not being
<nigel> .. as fast as needed to show every ISD.
<nigel> .. Those are my thoughts.
<nigel> Nigel: Thanks for that. How would it work here if there were a 25th of a second gap between two
<nigel> .. non-empty ISDs?
<atai> q+
<nigel> Pierre: The switch would open, so nothing would be available for display, at the end of the non-empty ISD.
<nigel> .. While the previous non-empty ISD is being displayed, the next non-empty ISD is being drawn into
<nigel> .. the alternate presentation buffer. When the very short empty ISD occurs, the switch opens,
<nigel> .. independently of the presentation buffer.
<nigel> .. When the empty ISD ends it closes and the next non-empty ISD, drawn into the alternate presentation buffer,
<nigel> .. is moved into the active presentation buffer and is displayed.
<nigel> ack at
<nigel> Andreas: When a non-empty ISD follows another non-empty ISD the presentation buffer will stay as it was
<nigel> .. before, but it will be toggled so nothing is sent to the display.
<nigel> ..s/display./display?
<nigel> Pierre: Exactly.
<nigel> Andreas: When a non-empty ISD comes up is there an additional action to clear the display?
<nigel> Pierre: The assumption is that it costs nothing to display nothing.
<nigel> .. In a traditional graphics system with a graphics plane with a front and back buffer, the assumption is
<nigel> .. that disabling compositing with the graphics plane is a zero cost operation.
<nigel> Andreas: OK
<nigel> Pierre: In modern systems that is true.
<nigel> Andreas: Maybe there is no cost associated with it, but you may have to delete a DOM node or whatever, in HTML,
<nigel> .. so there is something that needs to be done.
<nigel> Pierre: If you're using cues, when no cue is displayed there is no work.
<nigel> .. There's work in preparing a non-empty cue but there's no such thing as an empty cue, just no cue.
<nigel> Andreas: That's right, but there is no duration with the ISD that you send to the display, right?
<nigel> Pierre: The way the HRM is written, the output of the presentation buffer is "made available" to the display.
<nigel> .. That's really broad, "display", here. It depends on how fast the output rendering device can go.
<nigel> .. The HRM says today that it is conceivable that a non-empty ISD, that has content, is so short, that it will
<nigel> .. never be displayed because it comes between two refresh cycles of the display. That's already acknowledged
<nigel> .. by the HRM.
<nigel> Nigel: If I understand right, this model will not catch very short empty ISDs?
<nigel> Pierre: I've agonised over this.
<nigel> .. I really like the idea of catching errors, making an integrated SHALL statement in the model.
<nigel> .. But even with non-empty ISDs we can't guarantee they will be displayed.
<nigel> .. I've been agonising about a SHALL or a SHOULD for the minimum duration of a sequence of empty ISDs.
<nigel> .. The other question is what should the lower bound of that duration be?
<nigel> .. Another important point for the notes:
<nigel> .. Going back to the minimum duration of empty ISDs, I'm uncomfortable:
<nigel> .. Netflix has a min duration of 2 frames, I've heard 3, maybe 1, but nobody ever says what the frame rate is!
<nigel> .. So I'm not sure what the right lower bound is.
<nigel> .. Having a SHOULD for any validator to warn if there is an empty ISD shorter than 1/30s, but making
<nigel> .. it a normative SHALL requirement is another thing.
<nigel> Nigel: That's a strong argument: if we don't have data points to support SHALL statements we ought not to have them.
<nigel> Pierre: A SHOULD would not be bad to have, but it would be a heuristic. I'd say right now, that neither
<nigel> .. 608 or STL can have gaps shorter than 1/30s but it is possible with EBU-TT-D say.
<nigel> Nigel: [asks about the edge case of a very short non-empty ISD and empty ISD followed by a non-empty ISD]
<nigel> Pierre: The cost of clear is still present for erasing the presentation buffer, that doesn't change.
<nigel> Nigel: Yes, I see.
<nigel> Pierre: The minimum duration of a non-empty ISD has not changed.
<nigel> Nigel: No, it has, because it now also includes the duration of a following empty ISD whereas before it did not.
<nigel> Pierre: Right, absolutely, you could have a non-empty ISD that's a millionth of a second followed by a
<nigel> .. relatively long empty ISD - before, that was not possible.
<nigel> Nigel: Worth noting, it may already be present, that this is not about the readability complexity,
<nigel> .. but the presentation complexity.
<nigel> Pierre: I'm going to note that, maybe make sure there's a statement like that in the document.
<nigel> .. I'll update the pull request.
<nigel> Nigel: Thank you!
<nigel> Pierre: Thank you for hearing me out.
<nigel> SUMMARY: @palemieux to update the pull request
palemieux commented 1 year ago

This change has them the other way around!

I agree. DECE had adopted the opposite convention. Let's fix it now.

palemieux commented 1 year ago

This feels like a nit, and I'm not sure how important it is, but I was surprised by the ordering of front buffer and back buffer. I expected them to be the other way around. By way of quick research I found Wikipedia page about double buffering which also seems to suggest that the front buffer is the one that is closest to the display, and the back buffer is the one you draw into before copying into the front buffer. This change has them the other way around!

I like the diagrams: the isd-rendering-time.svg might usefully have some indication of when rendering begins and ends vs when presentation begins and ends. It looks a little strange that the rendering duration for every ISD exactly fills the available time: a document that passes will always have the rendering duration be shorter, and one that fails will have at least one ISD that takes too long to render. I wonder if it's worth adjusting to show that?

See https://github.com/w3c/imsc-hrm/pull/50/commits/696f8fe51a16ecd0982768c14d8fc36a2134d912

nigelmegitt commented 1 year ago

Quoting myself:

we assume that the document is available and parsed into ISDs even before the user begins playback of the related video object

Now I think about it, there's another weird thing here. Is the implementation supposed to pre-render the first ISD into the back buffer just in case the user begins playback? Or is it supposed to predict when playback is going to begin and, IPD earlier than that, begin rendering the first ISD? For a sequence, e.g. of fragmented MP4, this is actually reasonable, since the playback time is predictable. But for a single document instance associated with a video resource that may begin playback at any time, it is strange.

In that specific case, arguably the first ISD should begin being rendered at time zero, to give the implementation a chance. Could this be too philosophical and not something to worry about? I have seen real world implementations where the first ISD did not display if its begin time was too early, including examples where begin == 0s.

palemieux commented 1 year ago

Is the implementation supposed to pre-render the first ISD into the back buffer just in case the user begins playback?

There is latency inherent in starting A/V playback, e.g., the playback start time does not necessarily align with video I-frames or audio frame boundaries. The HRM acknowledges this by introducing a latency of IPD and gives the player some time, before playback starts, to render the first ISD that will be displayed.

nigelmegitt commented 1 year ago

There is latency inherent in starting A/V playback

It seems like a fragile design choice here to rely on implementation-dependent and/or codec-dependent features of the related video playback.