Closed linlinxza closed 1 year ago
Thank you for the question.
There are two ways to have proper punctuation for Speech to Text:
Update: 'Restore punctuation' option doesn't work right now for Slovak and Danish but will in upcoming version.
Update:
Sorry, for some reason instead of 'pronunciation' I wrongly assumed that you are asking about 'punctuation' 😄 .
Which language / model do you observe the problem with?
In general app just provides what is produced by Text to Speech engine (Piper, RHVoice, Coqui, eSpeak or MBROLA). But I can check if something related to pronunciation was fix in these project and integrate changes into Speech Note.
Yeah. Pronunciation. I am just looking to see if I can fix the pronunciation of character names. For example, DBZ (Goku, Gohan, Chichi, Vegeta) or, Inuyasha character names (Naraku, Kikyo. etc.). I am writing fanfiction crossover. If you are familiar with anime, you will notice characters have weird names.
I have been looking for a solution for text to speech for proof reading. And this I find is the best one for Linux that doesn't require an Internet connection no less.
Anyway, it would be great if there was a way to manually fix the pronunciation of names. Because then that would be less work for you. Currently, I am using English (Piper LibriTTS High). It sounds great. Like Natural Readers.
Ok, I understand.
I've tested these names (Goku, Gohan, Chichi, Vegeta) and indeed pronunciation is not great :/ Did you try 'Coqui LJSpeech VITS Female' model? It might be better in my opinion.
It looks that Piper engine had an update. I will integrate changes to Speech Note and will check if something is improved. Give me few days.
It's not a huge deal. I mean, it still works great (with the current voice I am using, it's very human sounding). And I found a workaround. I'll just copy the text of each document and paste it into a program like Kate (text editor) and then just replace all instances of the names like so:
Goku = Gohku
Gohan = Gohahn
Chichi = CheeChee
Vegeta = Vegeeta
Naraku = Naarrahku
Kikyo = Keekyo
Inuyasha = Inewyasha
Then I'll just paste it into Speech Note. It works. It's a small price to pay for privacy.
This is what I did. I made a script to have LibreOffice convert the odt file to a text file and then use sed to replace words via a definition file, send it to the clipboard, and then remove the generated text file as there is no more use for it:
#!/bin/bash
FILE_TO_EDIT="$1"
PLAIN_TEXT_FILE=$(basename -- "$FILE_TO_EDIT" .odt)
PRONUNCIATION_FILE=pronunciation.txt
if [ -f "$PRONUNCIATION_FILE" ]; then
flatpak run org.libreoffice.LibreOffice --convert-to txt "$FILE_TO_EDIT"
for line in $(cat $PRONUNCIATION_FILE); do sed -i "s/$line/g" "$PLAIN_TEXT_FILE.txt"; done
cat "$PLAIN_TEXT_FILE.txt" | xclip -selection clipboard
rm "$PLAIN_TEXT_FILE.txt"
else
echo "$PRONUNCIATION_FILE does not exist, or is missing."
fi
And this is what my pronunciation file looks like:
Goku/Gohku
Gohan/Gohahn
Chichi/CheeChee
Vegeta/Vegeeta
Naraku/Naarrahku
Kikyo/Keekyo
Inuyasha/Inewyasha
Then I'll just paste it into Speech Note. It works. It's a small price to pay for privacy. Guess I'll consider this as solved.
That is very smart 👍🏿
Just a thought... since you are already using scripts to process your input data, maybe instead of Speech Note (GUI app) it would be easier for you to directly use piper from command line to produce wav file with speech synthesis. If you decide to go this way, you need to download your favorite tts model from here as well.
That is very smart 👍🏿
Just a thought... since you are already using scripts to process your input data, maybe instead of Speech Note (GUI app) it would be easier for you to directly use piper from command line to produce wav file with speech synthesis. If you decide to go this way, you need to download your favorite tts model from here as well.
Nah. I'm okay like this. Besides, I'd like to hear it on the fly. Also, I made it so that I can convert odt files easily with the context menu in the Dolphin File Manager on the KDE Plasma desktop.
Thanks for calling it smart. Maybe you might want recommend this script to people who have issues with pronunciation.
Maybe you might want recommend this script to people who have issues with pronunciation.
Sure. I will do that!
I have one small request. Would you be able to test 'beta' version of the app? I've updated piper library and this might have a impact on pronunciation.
You can install this test version with:
flatpak install --user https://dl.flathub.org/build-repo/32714/net.mkiol.SpeechNote.flatpakref
To run test version:
flatpak run --branch=test net.mkiol.SpeechNote
To uninstall test version:
flatpak uninstall app/net.mkiol.SpeechNote//test
Did you observe any improvements?
Did you observe any improvements?
No. Not when it comes to the pronunciation of character names. There is no change. I used Piper LibriTTS High as well. I think this is a limitation of the models that are used. Not your app.
I still have to spell out the names in such a way to make the speech synthesizer say the names they way they should sound. Honestly, after making up the script, I can work with this. Now I can use context menu to just right click in the file manager on the file I want to prepare for speech synthesizing. Then I just paste it into the app. :)
BTW, in the script, you will notice "xclip" is being used to send modified text to the clipboard. But if someone is using Wayland as their display server, then will have to use "wclip" to send text modified by the script to the clipboard. And keep in mind that I have set this script to only work with odt files. So, whoever uses it will have to change the extension if they use a document different format. But I gather you know this.
Anyway, maybe to make this easier for people, what you could do is somehow integrate what I did into the GUI app? But, I just know simple scripting. I'm no software developer. So, I have no idea whether this is possible or not. My forte is more hardware and networking. lol
There is no change
Thank you for testing.
Anyway, maybe to make this easier for people, what you could do is somehow integrate what I did into the GUI app?
Your automation is clever and very useful for your specific need. I'm not sure if this kind of feature fits well into this particular app. Anyway, I'll think about it :)
Your automation is clever and very useful for your specific need. I'm not sure if this kind of feature fits well into this particular app. Anyway, I'll think about it :)
Thanks. Anyway, alright then.
I find this is a brilliant app for Linux. But, is there some way to fix how certain names or words are pronounced?