rumkin / duotone-reader

Screen reading enhancement with duo-voice text reading.
https://rumkin.github.io/duotone-reader
5 stars 2 forks source link

Include a problem statement #2

Open Malvoz opened 4 years ago

Malvoz commented 4 years ago

I've only used a screen reader on a few occasions for testing purposes, and wasn't aware that there's a need among screen reader users to have multiple voices reading aloud a web page. In a quick search, I landed on the User Agent Accessibility Guidelines (UAAG) Group Note, which in section 1.6.1 says:

1.6.1 Speech Rate, Volume, and Voice: If synthesized speech is produced, the user can specify the following: (Level A)

  • Speech rate
  • Speech volume (independently of other sources of audio)
  • Voice, when more than one voice is available

While the last item in the list does indeed indicate that multiple voices can be used in reading aloud web pages, I think the demo page could better describe the use case(s). Currently I can only assume that there is demand for this from the solution that is described.

Does that make sense?

Malvoz commented 4 years ago

I.e., do popular screen readers already allow users to separate metadata vs content this way? Have users asked for it? Even if they haven't, would it benefit them - or confuse them?

rumkin commented 4 years ago

I landed on the User Agent Accessibility Guidelines (UAAG) Group Note, which in section 1.6.1 says

Thanks for reference, I will read it. I've read SpeechSynthesis API W3C spec, which describes the same synthesis options. I think that single voice synthesis became a standard by historical reasons. When screen readers has been developing, there was no so much development tools like we have today and there were much more challenges on early stage. And it probably decided near to impossible to implement multi voice system. So I thing that such system hasn't been researched before. I haven't find any research about multi-voice synthesis yet.

Have users asked for it? Even if they haven't, would it benefit them - or confuse them?

Actually I don't know and this is what I want to figure out. This project is a research started to find an answer for the questions you ask. Current API design doesn't suppose independent research and realize already well-known solutions. I think it should be enhanced to make the speech synthesis itself better researchable to help independent developers to experiment with this technology to find more and more solutions.