impress / impress.js

It's a presentation framework based on the power of CSS3 transforms and transitions in modern browsers and inspired by the idea behind prezi.com.
http://impress.js.org
MIT License
37.62k stars 6.67k forks source link

HTML5 Text To Speech API integration #420

Open mofosyne opened 10 years ago

mofosyne commented 10 years ago

What would be interesting, is if you can sync a transcript to be spoken out loud using HTML5 Text To Speech API.

It would make for a more hands on presentation.

http://updates.html5rocks.com/2014/01/Web-apps-that-talk---Introduction-to-the-Speech-Synthesis-API

This works best on chrome, but is likely to be supported by everyone since it is a HTML5 official standard.

FagnerMartinsBrack commented 8 years ago

Hi @mofosyne , in an effort to clear up older issues/PRs we are pinging back to know if you are still tracking this request.

To give a little bit of context, recently a decision was made in the project to make the development more active and the first task is to clear up older issues like this one to see if the OP is still interested in keep it going.

mofosyne commented 8 years ago

Well this was more of a wishlist kind of thing. So I am tracking it in the sense I still would like to see it happen. But I'm not sure if HTML5 TTS technology is going to mature anytime soon for this to be viable :/

FagnerMartinsBrack commented 8 years ago

Hi @mofosyne thanks for the response! Can you please provide a real use case where this feature makes sense in the context of a presentation? Site or talk presentation.

I believe this will be best implemented as a plugin once we have a documented way of creating plugins, but we need to know the use case anyway for knowledge purposes.

mofosyne commented 8 years ago

Stand alone presentations I imagine? Would that make sense? But the issue would be that you would also need to implement stuff like (virtual cursors), to make full use of how a presenter during an actual lecturer would function in a online lecture.

(So you'll probbly need to also invent a semi-markdown syntax, to markup the inflection and stressor in a person's voice as well. As well as parts of the speech where they would physically point or highlight a passage.).

mofosyne commented 8 years ago

Basically instead of recording an audio to go with the presentation, you type the speech onto it. (Hmmm... which would also end up doubling as a very accessible way for the deaf to view a web presentation as well).

FagnerMartinsBrack commented 8 years ago

Can't you just automate the whole process as it is now using the .next() API? I imagine one could read the content of the steps with the Speech API and call .next() once it is done. This is definitely not something that impress.js should take care in the core.

(Hmmm... which would also end up doubling as a very accessible way for the deaf to view a web presentation as well).

This can already be done using WAI-ARIA, which is a specification to use special attributes that enables html to be viewed through a text reader for def people. Again, something that can be done in a per use basis.

mofosyne commented 8 years ago

That's a good point, is there already a project that would read out a close captioning text file and sync the synth speech to press the .next() and highlight/cursor as needed?

FagnerMartinsBrack commented 8 years ago

That's a good point, is there already a project that would read out a close captioning text file and sync the synth speech to press the .next() and highlight/cursor as needed?

Not that I know of, you could be the first one to show that :)

FagnerMartinsBrack commented 8 years ago

@mofosyne Now re-reading my comments I realised that you were talking about def, not blind. WAI-ARIA, as far as I know, only solves the problem of blind people (it allows efficient use of a screen reader).

This is definitely not a feature for the core itself, but rather to be developed a plugin. In the current state of impress.js, is there anything blocking this from being achieved in your side?

mofosyne commented 8 years ago

I imagine it is more convenient to write the speech script inline with the actual slides. This is since if people distribute the slides, you want the speech script to go along for the ride as well (Having the html slide, and the transcription as separate files means that the transcription has a higher likelihood of accidentally getting omitted)

It could be done as a plugin possibly for the logic, but there needs to be a consistent way to insert the speech text in the slide source in a way that doesn't mess up the slide if the plugin is missing.


About trying to bake in speech synthesis of the transcription in slide animation. While not aimed at blind, such feature is useful in having a easy way to edit and distribute lecture side without having to re-record the audio again. I'm thinking of this in the context of something like a wiki platform but for lecture style content. Not sure how practical it is in reality, but is interesting to consider.

But anyhow the html5 web speech synth is too new to be useful at the moment to be effective. I coded up a proof of concept which had multiple issues.

I can send you a link to my proof of concept so you can get an idea of what I was thinking, and why I think its still has some ways to go before we should try to implement this in impress.js . What's your email?

mofosyne commented 8 years ago

Oh don't worry, found your email from your profile. I sent you an email with the proof of concept I written a while ago.

FagnerMartinsBrack commented 8 years ago

I can send you a link to my proof of concept so you can get an idea of what I was thinking, and why I think its still has some ways to go before we should try to implement this in impress.js .

Can you please? It would be very useful.

What's your email?

Can't you post it here? Github accepts zip files. I would like to keep the discussion online, because anyone have access to it and could help in finding an optimal solution for the problem.

mofosyne commented 8 years ago

I'll certainly consider, maybe after I get some comments from you (might need to clean it up). Btw did you get my email?

On Mon, Apr 25, 2016 at 1:52 AM, Fagner Brack notifications@github.com wrote:

I can send you a link to my proof of concept so you can get an idea of what I was thinking, and why I think its still has some ways to go before we should try to implement this in impress.js .

Can you please? It would be very useful.

What's your email?

Can't you post it here? Github accepts zip files https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/. I would like to keep the discussion online, because anyone have access to it and could help in finding an optimal solution for the problem.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/impress/impress.js/issues/420#issuecomment-213987704

mofosyne commented 8 years ago

Just to note, on implementing speech synth of such lecture speech transcription. I found a pretty nifty javascript Text To Speech lib totally in js, located in this link http://www.masswerk.at/mespeak/ . Since its totally client side, it deals with the unreliability of the current experimental web speech api (which is unsupported in firefox totally). Quality is not as good as google speech, but it is reliable on both chrome and firefox. (e.g. this js rap http://www.masswerk.at/mespeak/rap/ )

FagnerMartinsBrack commented 8 years ago

@mofosyne

I imagine it is more convenient to write the speech script inline with the actual slides ...

You say "speech script", I understood initially that this request was related on reading out loud the content of the steps, but it seems to be related to a transcript to read while the steps are executing. I guess I got it now.

So, if you want to create a transcript to be read with the actual slides you don't need anything implemented in impress.js, you can just use the impress:stepenter and impress:stepleave events to check when the step was changed so that the transcript can read something while it is being shown.

... but there needs to be a consistent way to insert the speech text in the slide source in a way that doesn't mess up the slide if the plugin is missing.

You can do that right now without worrying of messing up with any step if the "plugin" is missing. Just listen to the events mentioned above and include everything in a separate script in the page, this way you can remove the feature just by removing the script tag. It doesn't need to be only through a script tag, but any other method of module injection you can do in the web (ES6 modules, AMD, etc.).

Basically the request for this feature is totally fine, it would definitely be a nice to have feature when using impress.js.

But there is a problem.

As you are probably aware, impress.js have a lot of requests for features and the development resource is low. I find very good the excitement for new features on impress.js and your help in contributing to it, thanks for that. However, in order to continue I need at least the following questions answered:

  1. Given the suggestions I gave above and considering the current state of impress.js, is there anything blocking this from being achieved in the developer side?
  2. Are you suggesting this feature based in a real use case you have or this is just based in a "nice to have"?

Be aware that the purpose of impress.js is not just to be a presentation tool for talks, but a way to use an infinite canvas to provide a step-based mechanism for the web, so that it can also be used for presentation talks as one of the main use cases. We wouldn't want to add features in the core that try to tackle only one of the many use cases, although we are willing to support other devs doing innovative things on top of impress.js.

Looking forward on your response. Thanks.

henrikingo commented 6 years ago

Hi @mofosyne

impress.js now includes impressConsole.js. With this there comes a convention to use <div class="notes"> to write text within each slide that is picked up and shown in the speaker console. (Note that it's still the responsibility of the presentation author to use display: none CSS to actually hide these notes from the presentation.)

A text to speech plugin could use this same convention - at least that could be the default behavior. It should listen on the impress:stepenter event, and then lookup if there's a div with notes, and then send that text to the text-to-speech engine.

See Plugin README for more information on creating plugins.

janishutz commented 1 year ago

I might be interested to develop such a plugin (or well more likely integrate it into the speaker console)