Implement text-to-speech for reading articles

mossroy commented 8 years ago

Like on the Android version of Kiwix.

Using the Speech Synthesis API : https://caniuse.com/#feat=speech-synthesis

kasemmarifet commented 6 years ago

I'll try to work on this as part of GoogleServe.

Jaifroid commented 6 years ago

Hi @kasemmarifet , thanks for your welcome contribution. I'm just testing your PR. It works offline on Chromium and Firefox Quantum on Windows 10 for English-language ZIMs, so congratulations!

I can't get it working on Microsoft Edge or Internet Explorer. On Edge, the speech button appears, and pressing it appears to initiate something, but no sound comes out. When I turn it off, briefly I get a loud-speaker icon in the browser tab for the page, indicating that some audio was activated at some point, but only, paradoxically, on turning off the read button. And there still is no sound. The API is described here: https://blogs.windows.com/msedgedev/2016/06/01/introducing-speech-synthesis-api/ , and there are some examples of how to use it for Edge, so maybe you can adapt your code.

On Internet Explorer there is no button, but that is expected, as it is not fully HTML5 compliant. It seems to degrade gracefully (by not showing the button).

When I load a Spanish-language ZIM, however, in Firefox, or a French ZIM in Chromium, in both cases, the read button attempts to use the English-language text-to-speech engine to read the non-English language. I'm sure that can be remedied fairly easily.

In any case, this is a great base on which to build. Thank you.

kasemmarifet commented 6 years ago

Thanks Jaifroid,

I just pushed an update to the code. We discussed this with Kelson a little bit (cced):

If the browser doesn't have the API (like IE), the code should handle it and not show the button
By default, the speech voice is the system one. We looked to see if we can get the content language to match the voice to it, but it seems like this is missing in the JS library. So further work has to be done here to get the content language and load the correct voice based on this. I added TODO for that.
For Edge: I'm not sure what the issue is there. From your description it looks like there is an issue with the voice. Can you please try to select a different voice from the dropdown?

Kasem

Jaifroid commented 6 years ago

@kasemmarifet @kelson42 Thank you for adding the language drop-down. Below is a screenshot of the languages that it shows for my system in Edge. However, none of these voices actually work in Edge (42.17134): no sound at all on pressing "play". As I say, it works fine in Firefox Quantum and Chromium. I'll try to see if I can do some debugging at weekend, but let me know if there's anything specifically I should be looking for. The following demo page works fine in the same install of Edge:

https://developer.microsoft.com/en-us/microsoft-edge/testdrive/demos/speechsynthesis/

Maybe it has some clues about what's necessary.

Jaifroid commented 6 years ago

@kasemmarifet - update, it does work on Edge! What happens is that it takes 8 or 9 seconds (I didn't time it precisely) for the audio to start, at least on the size of article I was testing. I had assumed it wasn't working after about six seconds, since it starts much faster on FF and Chromium, and was turning it off. I wonder if we could add an intermediate, "Please wait..." type of message, maybe in a small overlay message box or on the play button itself which would get cancelled once the audio starts. If I fell into this trap, others might. People aren't very patient nowadays, ahem.

Jaifroid commented 6 years ago

A minor point, but would you be able to put the different voices into a bootstrap split-button dropdown, like in the image below? It would save space rather than having a separate dropdown box (I realize the box was added to debug the voices, but it's useful to have the choices).

Jaifroid commented 6 years ago

One other thing: how difficult would it be to exclude certain types of text? I don't think it's desirable for it to read the infoboxes, for example. Or this might be a configuration option. It would also be useful to exclude footnote reference numbers, which are also currently read (and slow down the reading as a result). Any ideas on how best to achieve that?

Jaifroid commented 6 years ago

Just to summarize a few things that I believe are required to make a complete and genuinely useful solution for this issue, building on @kasemmarifet's work. I'd be very happy to work on some of these:

Choose the correct voice for the natural language of the ZIM (see discussion in #395 ), with the system default being the first choice if it matches the language of the ZIM; however, the user should be able to override the auto-selected voice, e.g., if they want a female voice rather than male or vice-versa);
Remember the chosen voice on a per-ZIM basis (cookie value);
Display the availability of the speech synthesis API in the API panel on the Configuration page; maybe also display the auto- or user-selected voice for the currently loaded ZIM; maybe we should also move the voice-selection options to the Configuration page;
Read only the main text of the article, not text in info boxes or nav boxes, or footnote reference numbers (reading info boxes could be an option in Configuration);
While preparing for text-to-speech synthesis, make the Read button unselectable and have it show "please wait"; change to Stop button once reading has started;
Allow reading from the cursor or selected text position - most Wikipedia articles are quite long, if we always start at the beginning of the article, the usefulness of a reading feature becomes extremely limited;
Provide a way to vary the speed of reading - this is essential in the long run, in my opinion, if the feature is to be of real use, but it could be added as a todo;
Ideally, highlight the words being read using utterance.onboundary - see https://stackoverflow.com/questions/38120478/speech-synthesis-api-highlight-words-as-they-are-spoken . I think users will expect this and expect to know where they are in the article. It also makes the feature genuinely useful for things like language learning.

mossroy commented 5 years ago

I like this TODO-list, but it does not mean it's all-or-nothing. Based on what @kasemmarifet would be willing to do (and maybe what he discussed with @kelson42 ), we might decide together what is required to merge the PR #394, and what could be split into other improvement issues (that could be implemented later and/or by someone else)

Jaifroid commented 5 years ago

I completely agree -- we can certainly break this down into more than one PR. The current PR #394 will need either the first point above (choose correct voice for the language of the ZIM) and/or move the voice-selection option to the configuration page (as it is currently displayed in an obtrusive manner) before we could merge it.

@kasemmarifet, could you let us know if you are able to do any further work on your PR? If your allotted GoogleServe time is up, please let us know so that we can work out how to take this forward.

kasemmarifet commented 5 years ago

Sorry for the late reply, I was on vacation.

Thanks for the list of further improvements. Can we put this change behind a flag (turned off by default) and then we can do the work to get the locale for the document. Once we have the locale we can use that to select the voice correctly.

I can't spend too much time this week on this but I can do the change to put this behind a flag. Can someone work on getting the locale in a separate PR?

Jaifroid commented 5 years ago

@kasemmarifet We have a PR #397 for the language code, but it's currently returning ISO-639-3 instead of BCP 47 format, which is needed for voice synthesis -- see discussion in #395. However, I can do a PR for returning the BCP code, as it is contained in the ZIM's meta Name attribute.

Bam92 commented 4 years ago

@Jaifroid Any update about this issue?

ykabusalah commented 4 years ago

Can I tackle this issue?

Jaifroid commented 4 years ago

@ykabusalah Please look carefully at the discussion above. A PR was already made, but it was never completed. If you think you can complete the existing PR according to the requirements in the discussion above, please do try.

Rbcoder1 commented 11 months ago

If This Issue Is Not Solved Am Interested To Solved It Please Assign It To Me

Jaifroid commented 11 months ago

Again, please work on one at a time. I think this issue depends on the UI changes, because room will need to be made for a "read aloud" button.

Rbcoder1 commented 11 months ago

Ok

On Sat, 22 Jul 2023, 12:43 pm Jaifroid, @.***> wrote:

Again, please work on one at a time. I think this issue depends on the UI changes, because room will need to be made for a "read aloud" button.

— Reply to this email directly, view it on GitHub https://github.com/kiwix/kiwix-js/issues/166#issuecomment-1646512927, or unsubscribe https://github.com/notifications/unsubscribe-auth/AZRVEQUTYKQ2J2H3CMPFBXTXRN4P5ANCNFSM4BZHNYMQ . You are receiving this because you commented.Message ID: @.***>

Paulie-Aditya commented 7 months ago

Could I be assigned to this issue?

Jaifroid commented 7 months ago

@Paulie-Aditya Please take a look at https://github.com/kiwix/kiwix-js/blob/main/CONTRIBUTING.md, set up your development environment, and make sure you're happy with the process here. If all is well, please come back here outlining your suggestion of how to complete this issue so that I can assign you. A particular problem in the past has been how to organize the UI to invoke this function, so I'd be interested in what you propose.

In some browsers, reading aloud just works already. For example, in Edge, pressing Ctrl-Shift-U will start to read the loaded article and provide its own UI.

Hamza1821 commented 4 months ago

is it open..if it is assign it to me

Jaifroid commented 4 months ago

@Hamza1821, please read the instructions above, and come back here with your proposed solution before you start coding. Be sure to read all the discussion above as well.

kiwix / kiwix-js

Implement text-to-speech for reading articles #166