w3c / imsc-hrm

IMSC Hypothetical Render Model
https://w3c.github.io/imsc-hrm/spec/imsc-hrm.html
Other
1 stars 6 forks source link

HRM and screen clearing after empty ISD #6

Closed cconcolato closed 2 years ago

cconcolato commented 3 years ago

I've tried to run one example of subtitles in https://github.com/sandflow/imscHRM and ran into the following problem. Consider the following case:

<p xml:id="subtitle3" begin="20.062s" end="22.940s" region="region1" style="style1">このトランプ<br/>おしっこの臭いがする</p>
<p xml:id="subtitle4" begin="23.023s" end="24.024" region="region1" style="style1">嗅いでみて</p>

imscHRM produces:

ERROR:imschrm.hrm:Rendering time exceeded at 23.023s (doc w3c/imsc#7)
  available time: 0.083s | HRM time: 0.101
  Glyph copy count: 0 | render count: 5 | Background draw count: 0

Validation failed

The XML produces at least 3 ISD:

After discussion with @palemieux it seems that the problem is that the HRM always assumes that a clear operation has to be performed prior to painting the new ISD text. The IMSC spec says:

DUR(En) = S(En) / BDraw + DURT(En) + DURI(En) S(En) = CLEAR(En) + PAINT(En ) where CLEAR(E0) = 0 and CLEAR(En | n > 0) = 1, i.e. the Root Container Region in its entirety.

In the above case, ISD2 already has cleared the screen, therefore there should be no need to clear the screen again to paint ISD3.

nigelmegitt commented 3 years ago

This is an interesting case. It seems that the HRM effectively penalises authoring patterns that put a brief duration "clear" ISD between two ISDs that display some content. Would it make sense to compute the available rendering duration from the beginning of the most recent non-empty ISD instead of the most recent (potentially empty) ISD?

palemieux commented 3 years ago

Say we have ISD_0, ISD_1, ISD_2, ISD_3 where ISD_1 and ISD_2 are empty.

The root container needs to be cleared at the start of ISD_1, so the available render time for ISD_1 is dur(ISD_0).

The render time for ISD_2 is 0, since the root container is already cleared, i.e. it can be merged into ISD_1.

The render time for ISD_3 should not include clearing the root container (since it was already cleared) and the available render time should be dur(ISD_1) + dur(ISD_2).

So I think:

btsimonh commented 3 years ago

After a mail exchange, I must highlight the following:

The INTERVAL (formal term for the gap between subtitles) for quality translation subtitles should not be zero. So, if you penalise for this, that is bad…

From a psychometric perspective, the interval triggers the brain to reset and read the next subtitle. If you have a subtitle follow another without a gap, then the second subtitle’s reading duration is compromised as the brain takes precious hundreds of milliseconds to realise that the text changed. So to be able to minimise the subtitle duration (maximise the information available), the interval cannot be zero.

General guidance: An interval of 3 frames is the ‘norm’. A scene change also acts as a re-trigger for the brain. In my own research many years ago, one quality measure on subtitle is that short intervals between subtitles should be equal. i.e. if all the intervals between subs are 3 frames, the subtitles are perceived as better than if the interval is 1 frame, then 3, then 5, then 2, etc. – the temporal effect on the brain’s expectations cannot be under estimated. But also, if given a choice, I would put the ‘default interval’ either side of a scene change. I.e. in the same research, I found that ending a subtitle at the ‘default interval’ before a cut, and starting the new subtitle the ’default interval’ after a cut gave the most acceptable result – which kind of makes sense. 'minimum interval' and 'default interval' have been built in to every subtitle editing system of value since year dot. (except for when ITFC told me they did no care and wanted the outcue to overlap the next incue if required, and I could not argue because they were the customer - but then see 'Teletext' below, which mitigates this at the time, and was their excuse).

When I say ‘frames’, I refer to PAL, ~40ms, but the actual time is not critical. i.e. 3 frame of ~33ms is fine, but 3 frames at 60, probably out of range. So read '3 frames' as ~120ms.

Note that this principle was effectively 'built in' to Teletext at an early stage - because data was delivered field by field, and most commercial UK TV channels only used one of the two field for subtitles, there was ALWAYS a gap of a frame at least between subtitles - since a cleardown header was send, followed by row data on later fields (in the ITV/CH4 case frames...).

Reference the HRM (caveat - I have not looked at it for some time, and not reviewed it before posting this), surely everything should be considering a double-buffered model. i.e. the 'next' subtitle should be being rendered to a back buffer even whilst the current is 'onair'? Clearing the displayed information should be 'free' - as in most cases of hardware, it would be the swapping of a memory region to be displayed.

br,

Simon

palemieux commented 3 years ago

@btsimonh Can you provide sample files?

palemieux commented 3 years ago

I have created a branch of imscHRM that avoids clearing the root container after an empty ISD and merges sequences of empty ISDs:

https://github.com/sandflow/imscHRM/pull/3

btsimonh commented 3 years ago

A couple of extra notes. I asked a translation subtitler with 25 years of experience about interval preferences. She replied: "I like 1 frame, most prefer 2, some prefer 3, few prefer 4."

Two simple test files (@palemieux - note modified as I noticed outline was not IMSC style limited): intervaltests.zip

(test file background: I believe the only material difference between these two is the span backgrounds - one has a transparent background with outline, the other black background. The main bulk of the subtitles have intervals of 3 frames between them. Note that I originally created these files to encompass what I believe to be a minimum baseline subtitle styling in use in broadcast translation subtitling over the last 30 years. i.e. vertical position, horizontal position (including mixed line alignments - hence why I choose to use a div per title), italics, colour (rare), boxing of spans. Things not represented here are backgrounds on complete lines, coloured backgrounds (again, rare in translation). The other reason for div per title here is easier editability - only one 'thing' per begin/end pair.)

br, Simon

nigelmegitt commented 3 years ago

The render time for ISD_3 should not include clearing the root container (since it was already cleared) and the available render time should be dur(ISD_1) + dur(ISD_2).

@palemieux in this example from https://github.com/w3c/imsc/issues/575#issuecomment-905004767 it might be worth making a change so that the available render time is even greater in this case, i.e. dur(ISD_0) + dir(ISD_1) + dur(ISD_2) if we consider a clear to be a special case. Thoughts welcome.

nigelmegitt commented 3 years ago

@btsimonh thank you for your great input on this issue. Our requirements need to encompass all current practices rather than whatever might or might not be "the norm" - it's sufficient here to note that some practices do include a short gap, and that use case does need to be supported.

btsimonh commented 3 years ago

"dur(ISD_0) + dir(ISD_1) + dur(ISD_2)" - for a double buffered render where render time is effectively limited by buffer availability, I agree. However, there are also 'real world' things to consider. (These may be off topic).