w3c / publishingcg

Repository of the Publishing Community Group
https://www.w3.org/community/publishingcg/
Other
18 stars 8 forks source link

Best practice document: Extracting data for TTS and a "reader mode" #69

Open HadrienGardeur opened 5 months ago

HadrienGardeur commented 5 months ago

Text-to-speech (TTS) is among the most popular features in reading apps and slowly creeping up as a must-have feature in Web browsers as well.

But despite the popularity and usefulness of TTS, there is no best practice document providing guidance for developers on how they should implement this feature. The group working on accessibility for FXL publications has also identified that in addition to TTS, extracting text from an FXL resource could be used to provide a "reader mode" of the current page/spread, enabling users to adjust the text and layout to their needs.

For both TTS and a reader mode, reading systems need guidance about the way they should extract data from XHTML to build these alternate renderings:

sueneu commented 5 months ago

I agree. Building a Reader Mode view from TTS would be an efficient way to give the user choices for accessing the content of a book. A single source would mean consistency between audio mode and visual mode. Using the same code for Reader Mode and TTS would reduce redundant work in Epub production.

A best practice document would be helpful even if TTS doesn't ultimately work out as a basis for Reader Mode. Improved and consistent TTS among reading systems would lower the expense of making an accessible ebook. Publishers who can't create audio overlays could rely on robust TTS to make compliant Epubs. End users who require smaller Epub files would benefit from an audio option without media overlays. And anecdotally, few publishers and users are satisfied with the current TTS experience.

wareid commented 5 months ago

Research to do/Questions to ask:

cookiecrook commented 5 months ago

Also overlap with the CSS algo for converting to plaintext. https://www.w3.org/TR/css-text-4/#plaintext

cookiecrook commented 5 months ago

And work in ARIA/AccName...

HadrienGardeur commented 5 months ago

VitalSource seems to have a two-fold approach with a simplified and a detailed reading mode, as described by @rickj in the following comment: https://github.com/w3c/publishingcg/issues/72#issuecomment-1942724261

This is exactly the kind of information that we're looking for to kickstart this joint effort on TTS and reader mode.