mkiol / dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Mozilla Public License 2.0
542 stars 20 forks source link

Any way to fix pronunciation of words? #7

Closed linlinxza closed 1 year ago

linlinxza commented 1 year ago

I find this is a brilliant app for Linux. But, is there some way to fix how certain names or words are pronounced?

mkiol commented 1 year ago

Thank you for the question.

There are two ways to have proper punctuation for Speech to Text:

  • Use Whisper models (Tiny, Base, Small or Medium). Whisper models decode text with punctuation. They are great but have some flaws. For instance, Whisper is rather slow (especially Small and Medium) and doesn't provide partial results during decoding (you don't see immediate text which gets more accurate when you speak). If you can't use Whisper, second option might help.
  • Use 'Restore punctuation' option. You can enable it in the Settings (hamburger menu->Settings->Restore punctuation). It is in 'Experiments' section and turn off by default because when enabled, model initialization takes a lot more time. To make 'Restore punctuation' work you must also download additional 'Punctuation' model for your language (it is on very bottom of the models download list, in a 'Text to Text' section). Only following languages are supported right now: English, German, French, Spanish, Bulgarian, Italian, Polish, Dutch, Czech, Portuguese, Slovak, Slovenian, Greek, Swedish, Danish, Hungarian and Romanian. When 'Restore punctuation' option is enabled and 'Punctuation' model is set, Speech to Text decoding with Vosk and DeepSpeech/Coqui will produce text with punctuation. It works quite well but not perfect.

Update: 'Restore punctuation' option doesn't work right now for Slovak and Danish but will in upcoming version.

Update:

Sorry, for some reason instead of 'pronunciation' I wrongly assumed that you are asking about 'punctuation' 😄 .

Which language / model do you observe the problem with?

In general app just provides what is produced by Text to Speech engine (Piper, RHVoice, Coqui, eSpeak or MBROLA). But I can check if something related to pronunciation was fix in these project and integrate changes into Speech Note.

linlinxza commented 1 year ago

Yeah. Pronunciation. I am just looking to see if I can fix the pronunciation of character names. For example, DBZ (Goku, Gohan, Chichi, Vegeta) or, Inuyasha character names (Naraku, Kikyo. etc.). I am writing fanfiction crossover. If you are familiar with anime, you will notice characters have weird names.

I have been looking for a solution for text to speech for proof reading. And this I find is the best one for Linux that doesn't require an Internet connection no less.

Anyway, it would be great if there was a way to manually fix the pronunciation of names. Because then that would be less work for you. Currently, I am using English (Piper LibriTTS High). It sounds great. Like Natural Readers.

mkiol commented 1 year ago

Ok, I understand.

I've tested these names (Goku, Gohan, Chichi, Vegeta) and indeed pronunciation is not great :/ Did you try 'Coqui LJSpeech VITS Female' model? It might be better in my opinion.

It looks that Piper engine had an update. I will integrate changes to Speech Note and will check if something is improved. Give me few days.

linlinxza commented 1 year ago

It's not a huge deal. I mean, it still works great (with the current voice I am using, it's very human sounding). And I found a workaround. I'll just copy the text of each document and paste it into a program like Kate (text editor) and then just replace all instances of the names like so:

Goku = Gohku

Gohan = Gohahn

Chichi = CheeChee

Vegeta = Vegeeta

Naraku = Naarrahku

Kikyo = Keekyo

Inuyasha = Inewyasha

Then I'll just paste it into Speech Note. It works. It's a small price to pay for privacy.

This is what I did. I made a script to have LibreOffice convert the odt file to a text file and then use sed to replace words via a definition file, send it to the clipboard, and then remove the generated text file as there is no more use for it:

#!/bin/bash
FILE_TO_EDIT="$1"
PLAIN_TEXT_FILE=$(basename -- "$FILE_TO_EDIT" .odt)

PRONUNCIATION_FILE=pronunciation.txt
if [ -f "$PRONUNCIATION_FILE" ]; then
    flatpak run org.libreoffice.LibreOffice --convert-to txt "$FILE_TO_EDIT"
    for line in $(cat $PRONUNCIATION_FILE); do sed -i "s/$line/g" "$PLAIN_TEXT_FILE.txt"; done
    cat "$PLAIN_TEXT_FILE.txt" | xclip -selection clipboard
    rm "$PLAIN_TEXT_FILE.txt"
else 
    echo "$PRONUNCIATION_FILE does not exist, or is missing."
fi

And this is what my pronunciation file looks like:

Goku/Gohku

Gohan/Gohahn

Chichi/CheeChee

Vegeta/Vegeeta

Naraku/Naarrahku

Kikyo/Keekyo

Inuyasha/Inewyasha

Then I'll just paste it into Speech Note. It works. It's a small price to pay for privacy. Guess I'll consider this as solved.

mkiol commented 1 year ago

That is very smart 👍🏿

Just a thought... since you are already using scripts to process your input data, maybe instead of Speech Note (GUI app) it would be easier for you to directly use piper from command line to produce wav file with speech synthesis. If you decide to go this way, you need to download your favorite tts model from here as well.

linlinxza commented 1 year ago

That is very smart 👍🏿

Just a thought... since you are already using scripts to process your input data, maybe instead of Speech Note (GUI app) it would be easier for you to directly use piper from command line to produce wav file with speech synthesis. If you decide to go this way, you need to download your favorite tts model from here as well.

Nah. I'm okay like this. Besides, I'd like to hear it on the fly. Also, I made it so that I can convert odt files easily with the context menu in the Dolphin File Manager on the KDE Plasma desktop.

Thanks for calling it smart. Maybe you might want recommend this script to people who have issues with pronunciation.

mkiol commented 1 year ago

Maybe you might want recommend this script to people who have issues with pronunciation.

Sure. I will do that!

I have one small request. Would you be able to test 'beta' version of the app? I've updated piper library and this might have a impact on pronunciation.

You can install this test version with:

flatpak install --user https://dl.flathub.org/build-repo/32714/net.mkiol.SpeechNote.flatpakref

To run test version:

flatpak run --branch=test net.mkiol.SpeechNote

To uninstall test version:

flatpak uninstall app/net.mkiol.SpeechNote//test

Did you observe any improvements?

linlinxza commented 1 year ago

Did you observe any improvements?

No. Not when it comes to the pronunciation of character names. There is no change. I used Piper LibriTTS High as well. I think this is a limitation of the models that are used. Not your app.

I still have to spell out the names in such a way to make the speech synthesizer say the names they way they should sound. Honestly, after making up the script, I can work with this. Now I can use context menu to just right click in the file manager on the file I want to prepare for speech synthesizing. Then I just paste it into the app. :)

BTW, in the script, you will notice "xclip" is being used to send modified text to the clipboard. But if someone is using Wayland as their display server, then will have to use "wclip" to send text modified by the script to the clipboard. And keep in mind that I have set this script to only work with odt files. So, whoever uses it will have to change the extension if they use a document different format. But I gather you know this.

Anyway, maybe to make this easier for people, what you could do is somehow integrate what I did into the GUI app? But, I just know simple scripting. I'm no software developer. So, I have no idea whether this is possible or not. My forte is more hardware and networking. lol

mkiol commented 1 year ago

There is no change

Thank you for testing.

Anyway, maybe to make this easier for people, what you could do is somehow integrate what I did into the GUI app?

Your automation is clever and very useful for your specific need. I'm not sure if this kind of feature fits well into this particular app. Anyway, I'll think about it :)

linlinxza commented 1 year ago

Your automation is clever and very useful for your specific need. I'm not sure if this kind of feature fits well into this particular app. Anyway, I'll think about it :)

Thanks. Anyway, alright then.