kiwix / kiwix-js

Fully portable & lightweight ZIM reader in Javascript
https://www.kiwix.org/
GNU General Public License v3.0
281 stars 125 forks source link

Implement text-to-speech for reading articles #166

Open mossroy opened 8 years ago

mossroy commented 8 years ago

Like on the Android version of Kiwix.

Using the Speech Synthesis API : https://caniuse.com/#feat=speech-synthesis

kasemmarifet commented 6 years ago

I'll try to work on this as part of GoogleServe.

Jaifroid commented 6 years ago

Hi @kasemmarifet , thanks for your welcome contribution. I'm just testing your PR. It works offline on Chromium and Firefox Quantum on Windows 10 for English-language ZIMs, so congratulations!

I can't get it working on Microsoft Edge or Internet Explorer. On Edge, the speech button appears, and pressing it appears to initiate something, but no sound comes out. When I turn it off, briefly I get a loud-speaker icon in the browser tab for the page, indicating that some audio was activated at some point, but only, paradoxically, on turning off the read button. And there still is no sound. The API is described here: https://blogs.windows.com/msedgedev/2016/06/01/introducing-speech-synthesis-api/ , and there are some examples of how to use it for Edge, so maybe you can adapt your code.

On Internet Explorer there is no button, but that is expected, as it is not fully HTML5 compliant. It seems to degrade gracefully (by not showing the button).

When I load a Spanish-language ZIM, however, in Firefox, or a French ZIM in Chromium, in both cases, the read button attempts to use the English-language text-to-speech engine to read the non-English language. I'm sure that can be remedied fairly easily.

In any case, this is a great base on which to build. Thank you.

kasemmarifet commented 6 years ago

Thanks Jaifroid,

I just pushed an update to the code. We discussed this with Kelson a little bit (cced):

Kasem

Jaifroid commented 6 years ago

@kasemmarifet @kelson42 Thank you for adding the language drop-down. Below is a screenshot of the languages that it shows for my system in Edge. However, none of these voices actually work in Edge (42.17134): no sound at all on pressing "play". As I say, it works fine in Firefox Quantum and Chromium. I'll try to see if I can do some debugging at weekend, but let me know if there's anything specifically I should be looking for. The following demo page works fine in the same install of Edge:

https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/speechsynthesis/

Maybe it has some clues about what's necessary.

image

Jaifroid commented 6 years ago

@kasemmarifet - update, it does work on Edge! What happens is that it takes 8 or 9 seconds (I didn't time it precisely) for the audio to start, at least on the size of article I was testing. I had assumed it wasn't working after about six seconds, since it starts much faster on FF and Chromium, and was turning it off. I wonder if we could add an intermediate, "Please wait..." type of message, maybe in a small overlay message box or on the play button itself which would get cancelled once the audio starts. If I fell into this trap, others might. People aren't very patient nowadays, ahem.

Jaifroid commented 6 years ago

A minor point, but would you be able to put the different voices into a bootstrap split-button dropdown, like in the image below? It would save space rather than having a separate dropdown box (I realize the box was added to debug the voices, but it's useful to have the choices).

image

Jaifroid commented 6 years ago

One other thing: how difficult would it be to exclude certain types of text? I don't think it's desirable for it to read the infoboxes, for example. Or this might be a configuration option. It would also be useful to exclude footnote reference numbers, which are also currently read (and slow down the reading as a result). Any ideas on how best to achieve that?

Jaifroid commented 6 years ago

Just to summarize a few things that I believe are required to make a complete and genuinely useful solution for this issue, building on @kasemmarifet's work. I'd be very happy to work on some of these:

mossroy commented 5 years ago

I like this TODO-list, but it does not mean it's all-or-nothing. Based on what @kasemmarifet would be willing to do (and maybe what he discussed with @kelson42 ), we might decide together what is required to merge the PR #394, and what could be split into other improvement issues (that could be implemented later and/or by someone else)

Jaifroid commented 5 years ago

I completely agree -- we can certainly break this down into more than one PR. The current PR #394 will need either the first point above (choose correct voice for the language of the ZIM) and/or move the voice-selection option to the configuration page (as it is currently displayed in an obtrusive manner) before we could merge it.

@kasemmarifet, could you let us know if you are able to do any further work on your PR? If your allotted GoogleServe time is up, please let us know so that we can work out how to take this forward.

kasemmarifet commented 5 years ago

Sorry for the late reply, I was on vacation.

Thanks for the list of further improvements. Can we put this change behind a flag (turned off by default) and then we can do the work to get the locale for the document. Once we have the locale we can use that to select the voice correctly.

I can't spend too much time this week on this but I can do the change to put this behind a flag. Can someone work on getting the locale in a separate PR?

Jaifroid commented 5 years ago

@kasemmarifet We have a PR #397 for the language code, but it's currently returning ISO-639-3 instead of BCP 47 format, which is needed for voice synthesis -- see discussion in #395. However, I can do a PR for returning the BCP code, as it is contained in the ZIM's meta Name attribute.

Bam92 commented 4 years ago

@Jaifroid Any update about this issue?

ykabusalah commented 4 years ago

Can I tackle this issue?

Jaifroid commented 4 years ago

@ykabusalah Please look carefully at the discussion above. A PR was already made, but it was never completed. If you think you can complete the existing PR according to the requirements in the discussion above, please do try.

Rbcoder1 commented 11 months ago

If This Issue Is Not Solved Am Interested To Solved It Please Assign It To Me

Jaifroid commented 11 months ago

Again, please work on one at a time. I think this issue depends on the UI changes, because room will need to be made for a "read aloud" button.

Rbcoder1 commented 11 months ago

Ok

On Sat, 22 Jul 2023, 12:43 pm Jaifroid, @.***> wrote:

Again, please work on one at a time. I think this issue depends on the UI changes, because room will need to be made for a "read aloud" button.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js/issues/166#issuecomment-1646512927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRVEQUTYKQ2J2H3CMPFBXTXRN4P5ANCNFSM4BZHNYMQ . You are receiving this because you commented.Message ID: @.***>

Paulie-Aditya commented 7 months ago

Could I be assigned to this issue?

Jaifroid commented 7 months ago

@Paulie-Aditya Please take a look at https://github.com/kiwix/kiwix-js/blob/main/CONTRIBUTING.md, set up your development environment, and make sure you're happy with the process here. If all is well, please come back here outlining your suggestion of how to complete this issue so that I can assign you. A particular problem in the past has been how to organize the UI to invoke this function, so I'd be interested in what you propose.

In some browsers, reading aloud just works already. For example, in Edge, pressing Ctrl-Shift-U will start to read the loaded article and provide its own UI.

Hamza1821 commented 4 months ago

is it open..if it is assign it to me

Jaifroid commented 4 months ago

@Hamza1821, please read the instructions above, and come back here with your proposed solution before you start coding. Be sure to read all the discussion above as well.