mtmse / pipeline

Super-project that aggregates all Pipeline related code, provides a common tracker for Pipeline related issues and holds the Pipeline website
http://daisy.github.io/pipeline
0 stars 2 forks source link

Numbers in numbered lists are not read in TTS #7

Open martinpub opened 3 years ago

martinpub commented 3 years ago

Currently, numbers are not announced by TTS in numbered lists.

Sample markup for which Pipeline should send numbers as text to the TTS:

<ol>
<li><span id="st7-56">Vilka argument för förändringar framfördes i försvarsdebatten av politisk och militär försvarsledning och av fria opinionsbildare?</span><span aria-label="15" role="doc-pagebreak" title="15" ns2:type="pagebreak" id="page-15" class="page-normal"><span id="st7-57">15</span></span></li>
<li><span id="st7-58">Hur förändrades i verkligheten inställningen till försvarsmaktens uppgifter, struktur och organisation?</span></li>
<li><span id="st7-59">Hur kunde de dramatiska förändringarna ske utan politiskt motstånd och utan en omfattande kritisk debatt?</span></li>
</ol>

Comment from @bertfrees:

Incorporating the number in the text that is sent to the TTS is the way to go. We have code in Pipeline to number lists in the braille output based on CSS. I think it's a matter of generalizing this code so that it can be applied for speech output. Not a terribly complicated thing to do, but needs a bit of work.

Note that the Nordic Guidelines have a class "plain", which is used to exclude bulleting/numbering in what is visually rendered (i.e. list-style-type: none;), and it can be used in some cases also for <ol>. If numbered list number text is added, can it be made to honour <ol> with a class "plain" (or rather, <ol> where the resulting CSS instructions would apply through (any) class attribute), so that number text is not sent in those cases? Or would a better way to solve this be to use some speech-specific CSS in the source file?

bertfrees commented 3 years ago

Ideally the Pipeline TTS processing should take into account style sheets attached to the source. This is also how it works for braille output. However at the moment this is not the case for TTS yet. CSS needs to be provided as explained in http://daisy.github.io/pipeline/Get-Help/User-Guide/Text-To-Speech/, through the "TTS config" input. So in the initial solution, the list-style-type: none style would have to be provided separately, even though it's also present in the EPUB.

martinpub commented 3 years ago

Thanks for clarifying, @bertfrees! Will make sure to incorporate the speech part of the CSS into the tts-config file(s). But given list-style-type: none; being present in the tts config, then the future functionality in Pipeline to add numbers as text sent to the TTS could exclude these lists, right?

bertfrees commented 3 years ago

Right.

martinpub commented 3 years ago

Great, thanks!

martinpub commented 3 years ago

Hi @kalaspuffar, after discussing internally we think we should prioritise sentence detection and DAISY conversion in favour of TTS items. That way we can start the production of human narration talking books asap.

bertfrees commented 3 years ago

@carl-textalk @kalaspuffar I'll elaborate a bit on the solution I had in mind for this issue.

The px:tts-for-epub3 step is responsible for transforming a set of HTML documents (accompanied with lexicons and CSS stylesheets) to a set of audio clips. At the beginning of this step, before sentence and word detection (px:html-break-detect) happens, there is an opportunity to pre-process the HTML, notably to generate content that will be spoken but that will not be part of the result HTML. This approach is already used to insert special speech-only announcements, like for instance the word "sidan" before page numbers.

This is accomplished through CSS, for instance:

[role=doc-pagebreak]::before {
    content: "sidan ";
}

List numbering seems a logical extension of this because list numbers are also generated content (only present in speech and on screen but not in the text) and how lists are numbered on screen is determined by CSS too. If we want the visual and aural content to match, we have to look at the CSS.

Take for example the following CSS style sheet for ordered lists, which is the default style for HTML:

ol {
    list-style-type: decimal;
    counter-reset: list-item;
}

li {
    display: list-item;
    counter-increment: list-item;
}

The list-style-type property of an li element determines the content of its generated ::marker pseudo-element. In this case:

li::marker {
    content: counter(list-item, decimal) ". ";
}

There are also some other CSS features related to list numbering, such as the @counter-style rule, the counter-set property and the symbols() function.

The logic to compute and insert counter values based on CSS has already been fully implemented (in XSLT) in the context of the PEF production. With relatively little effort this code could be ported/adapted/generalized for TTS.

bertfrees commented 3 years ago

@kalaspuffar @carl-textalk I have added a file where I explain how I would break down the problem of list numbering into smaller steps: https://github.com/daisy/pipeline/commit/168c6f16de4817c0b0f25e7c0211e28d11042a84. Hope it is clear.

martinpub commented 3 years ago

Will return to this after summer.