w3c / dapt

Dubbing and Audio description Profiles of TTML2
https://w3c.github.io/dapt/
Other
5 stars 3 forks source link

Consider defining restrictions per Script Type #75

Open cconcolato opened 1 year ago

cconcolato commented 1 year ago

Some restrictions on the model, e.g. on cardinality #12 or based on profiles #46, could be dependent on the script type, e.g. the presence of Contextual Text. Should we have the generic definition be loose and have additional sections providing restrictions per script type?

nigelmegitt commented 1 year ago

In #77 I've made the script-type contents an informative note rather than normative requirements, because I think the flexibility needed probably applies to all script types.

For example see Translated Dialogue List (PR preview link)

Otherwise we would likely create a requirement for transient non-conformance.

For example, when creating an Original Language Dialogue List, initially there would be no transcribed text, so if we require, say, minimum 1 div including minimum 1 p, then the first workflow step, of creating effectively an "empty" document, maybe with some pre-defined metadata but no text content, would transiently create a non-conformant document, until at least one Script Event with Text is created. I'd prefer to avoid that.

nigelmegitt commented 1 year ago

Close as "won't do"?

cconcolato commented 1 year ago

Given that the profile syntax is multi-purpose, I still see value in being clear regarding what's allowed for a particular script type. It could allow workflows to validate incoming documents. Examples of potential validations:

nigelmegitt commented 1 year ago
  • a dubbing script should not contain audio description content,

I'm not sure how you'd identify this?

  • a dubbing original script should not contain 'translated' text

This is already a SHOULD requirement.

  • a dubbing script should not have empty divs, at least one text should be present

This seems overly restrictive to me. It prevents reasonable intermediate production stages, like creating a "shell" with empty timed Script Events where some upstream process has identified that there is dialogue to be dubbed, and that shell acts as a task list.

  • a pre-recording audio description should not contain any <audio> elements

Yes, we could add a SHOULD NOT contain <audio> constraint to the pre-recording stages.

  • an as-recorded audio description should not contain rate and pitch information

Yes, we could add a SHOULD NOT constraint there.

  • audio descriptions should not contain characters

This sounds overly prescriptive to me: presence of characters is unlikely to be harmful, and may have a use case that we haven't considered yet, such as marking up specific parts of audio descriptions as being related to individual characters - "wearing a blue hat", etc.

cconcolato commented 1 year ago

I'm not sure how you'd identify this?

Based on https://github.com/w3c/dapt/issues/11#issuecomment-1487799941

This seems overly restrictive to me. It prevents reasonable intermediate production stages, like creating a "shell" with empty timed Script Events where some upstream process has identified that there is dialogue to be dubbed, and that shell acts as a task list.

Then maybe we should add an attribute to indicate if a document is in a final stage or intermediate stage.

This sounds overly prescriptive to me: presence of characters is unlikely to be harmful, and may have a use case that we haven't considered yet, such as marking up specific parts of audio descriptions as being related to individual characters - "wearing a blue hat", etc.

Maybe. The purpose of these restrictions is really to make sure implementations can be simpler when they target only one type of application.

nigelmegitt commented 1 year ago

I'm not sure how you'd identify this?

Based on #11 (comment)

Sorry I mean how would you know what is audio description content? If you have a script marked up as being some kind of dubbing script, and it contains text that describes the video image rather than dialogue, how could an implementation know that? (this reminds me of a conversation about video QC where someone wanted to identify if the video image is upside down!)

This seems overly restrictive to me. It prevents reasonable intermediate production stages, like creating a "shell" with empty timed Script Events where some upstream process has identified that there is dialogue to be dubbed, and that shell acts as a task list.

Then maybe we should add an attribute to indicate if a document is in a final stage or intermediate stage.

This is already an issue: #52

This sounds overly prescriptive to me: presence of characters is unlikely to be harmful, and may have a use case that we haven't considered yet, such as marking up specific parts of audio descriptions as being related to individual characters - "wearing a blue hat", etc.

Maybe. The purpose of these restrictions is really to make sure implementations can be simpler when they target only one type of application.

Every constraint we specify adds implementation complexity, surely?

cconcolato commented 1 year ago

Sorry I mean how would you know what is audio description content? If you have a script marked up as being some kind of dubbing script, and it contains text that describes the video image rather than dialogue, how could an implementation know that? (this reminds me of a conversation about video QC where someone wanted to identify if the video image is upside down!)

I don't see the need to be able to do that (and it's probably not possible). I think we view this issue from different perspectives. In my view, the author/authoring tool sets the application type and script type and then makes sure to respect the restrictions. The receiving tool, possibly only accepting a restricted set of documents (maybe one application, maybe only some script types) rejects documents that declare an application type or document type that it does not support. It also rejects documents that don't respect the constraints of the declared application and script types.

Every constraint we specify adds implementation complexity, surely?

It depends from what point of view. If you're a dubbing script processor, knowing that a document is simpler than in the general case can make your implementation less complex. If you're an authoring tool that has to support all document types, it's possibly more complex.

nigelmegitt commented 1 year ago

I don't see the need to be able to do that (and it's probably not possible). I think we view this issue from different perspectives. In my view, the author/authoring tool sets the application type and script type and then makes sure to respect the restrictions. The receiving tool, possibly only accepting a restricted set of documents (maybe one application, maybe only some script types) rejects documents that declare an application type or document type that it does not support. It also rejects documents that don't respect the constraints of the declared application and script types.

This seems to have drifted - it began with the suggestion:

  • a dubbing script should not contain audio description content,

If we're agreed that we cannot identify audio description content, then this constraint cannot be implemented, right?

nigelmegitt commented 1 year ago

Gathering the thread re empty divs:

  • a dubbing script should not have empty divs, at least one text should be present

This sounds overly prescriptive to me: presence of characters is unlikely to be harmful, and may have a use case that we haven't considered yet, such as marking up specific parts of audio descriptions as being related to individual characters - "wearing a blue hat", etc.

Maybe. The purpose of these restrictions is really to make sure implementations can be simpler when they target only one type of application.

Every constraint we specify adds implementation complexity, surely?

It depends from what point of view. If you're a dubbing script processor, knowing that a document is simpler than in the general case can make your implementation less complex. If you're an authoring tool that has to support all document types, it's possibly more complex.

Aside from the general question about whether such single application implementations are likely, and if the implementers would benefit from additional constraints in this particular case of "handling empty divs": either way, if it is permitted or prohibited, I would suggest that every implementation needs to be able to handle the case anyway. It's unlikely that any user would want their dubbing script processor to fall over because of "bad" input in the form of an empty div / script event.

cconcolato commented 1 year ago

By "audio description content" I meant "audio description features" (such as using the audio element or animations.

cconcolato commented 1 year ago

If we go with the suggestion in https://github.com/w3c/dapt/issues/52#issuecomment-1499283189, we could indeed let empty divs be present in draft documents, but probably not in final documents.

nigelmegitt commented 5 months ago

See also https://github.com/w3c/dapt/issues/52#issuecomment-2023125700 in which I propose some external rule sets against which documents could be reviewed.

css-meeting-bot commented 5 days ago

The Timed Text Working Group just discussed Consider defining restrictions per Script Type w3c/dapt#75, and agreed to the following:

The full IRC log of that discussion <nigel> Subtopic: Consider defining restrictions per Script Type w3c/dapt#75
<nigel> github: https://github.com/w3c/dapt/issues/75
<nigel> Cyril: A lot of things have changed since this was opened.
<nigel> .. The gist of this issue is, if I place myself as a Netflix receiver of scripts,
<nigel> .. I want to be able to validate that if I receive a dubbing script it's not an audio description script,
<nigel> .. and vice versa.
<nigel> .. I can see that if somebody delivers a dubbing script to Netflix we will check that the <audio>
<nigel> .. element is not present in it, and reject a delivery if it does, for example.
<nigel> .. We do that for subtitles, have additional restrictions not in IMSC but Netflix-specific.
<nigel> .. When I opened this issue I wondered how many of these restrictions should be core vs
<nigel> .. organisation specific. I thought it would make sense to distinguish these transcript types.
<nigel> .. For example in an original transcript each event should have at most one <p> and the language source
<nigel> .. should be equal to the xml:lang always.
<nigel> .. We can go to CR without this. The risk is that the actual profile that is implemented has more
<nigel> .. restrictions than what it specified in the spec, and some people may impose additional restrictions than
<nigel> .. the spec.
<nigel> Nigel: What should we do - take the label off, or close with no action?
<nigel> .. I think we're missing a proposal for what these per-script type restrictions would be.
<nigel> Cyril: I realise that, we could still decide to go to CR without them.
<nigel> Nigel: I think so, yes.
<atai> q+
<nigel> Cyril: This could be a good segue into the discussion about represents.
<nigel> ack at
<nigel> Andreas: Apologies I lost a bit of progress on the spec.
<nigel> .. Just to check the use case, it is to work out whether a script is a dubbing script or an audio description script?
<nigel> .. I think that is a reasonable use case.
<nigel> q+
<nigel> ack ni
<nigel> Nigel: This has hurt my head in the past because I'm not sure how prescriptive we are about
<nigel> .. the different lifecycle stages for getting to localised versions - is an original language transcript
<nigel> .. a dubbing script, say? It's a start for one, or for subtitles, or both.
<nigel> Cyril: I would like to keep this open and propose something based on represents
<nigel> SUMMARY: @cconcolato to make a proposal
cconcolato commented 4 days ago

Reading the latest version of the specification (including the represents attribute), I could imagine the following restrictions:

We already have shoulds based on daptm:scriptType. By the way, we probably should make it clearer that you are not allowed to have multiple p elements in a Script Event with the same xml:lang. You may have multiple elements having the same daptm:srcLang but they must differ by xml:lang.

With the constraints that I proposed in https://github.com/w3c/dapt/pull/241#issuecomment-2350767621 (first bullet 2), we get the restriction I suggested earlier:

a dubbing script should not contain audio description content

and vice-versa.

The only new restriction I could think of is:

The following restrictions already mentioned above don't seem to be covered but seemed agreeable :

a pre-recording audio description should not contain any

Yes, we could add a SHOULD NOT contain

an as-recorded audio description should not contain rate and pitch information

Yes, we could add a SHOULD NOT constraint there.