readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.45k stars 218 forks source link

How to use Windows 10 TTS engine for Hebrew language? #248

Closed Liranbz closed 3 years ago

Liranbz commented 4 years ago

Hi! Thank you for your amazing tool. I try to use Windows 10 TTS engine, by Jampal PTTS (http://jampal.sourceforge.net/ptts.html) I use my windows TTS because I need the Hebrew language, so I cant use eSpeak.. This is the command from my cmd:

C:\Users\liran_bz\Desktop\train data w2v>python -m aeneas.tools.execute_task all_speak.wav all_speak.txt "task_language=heb|os_task_file_format=srt|is_text_type=plain" all_speak.srt -r="tts=custom|tts_path=C:\Program Files\Jampal\ptts.vbs |allow_unlisted_languages=True"

[WARN] The default input encoding is not UTF-8. [WARN] You might want to set 'PYTHONIOENCODING=UTF-8' in your shell. [WARN] The default output encoding is not UTF-8. [WARN] You might want to set 'PYTHONIOENCODING=UTF-8' in your shell. [INFO] Validating config string (specify --skip-validator to bypass)... [INFO] Validating config string... done [INFO] Creating task... [INFO] Creating task... done [INFO] Executing task... [ERRO] An unexpected error occurred while executing the task: [ERRO] Unexpected error while executing task : Unable to load custom TTS wrapper : EOL while scanning string literal (ptts.vbs, line 1)

What I`m doing wrong? thank you for your HELP!

chrisvaughn commented 4 years ago

I can’t help with the Windows specific question but my experience may help with Hebrew. I have had pretty good luck transliterating the text using unidecode https://pypi.org/project/Unidecode/ and then using espeak with the language set to Esperanto. Since the goal of Esperanto was to have a easily to use global language it turns out that the rules for the TTS model work surprisingly well when you have a transliterated text. I’ve done this with Hebrew and Japanese so far. You may want to give that a try. The more I’ve used Aeneas the more I’ve found that a better sounding TTS doesn’t necessarily give better results.

Liranbz commented 4 years ago

Hi @chrisvaughn , Thank you for your answer! So you first use unidecode library and then use aeneas, but how you use the original hebrew words? you replace them one by one? thank you!

chrisvaughn commented 4 years ago

Hi @Liranbz -

You have 2 options. Installing https://pypi.org/project/Unidecode/ gives you a command unidecode that can take a file and output the transliterated equivalent like: unidecode hebrew.txt > transliterated.txt

If you already have a python script to prepare your aeneas input you can also use the unidecode library like:

def transliterate(lines):
    for i, line in enumerate(lines):
        lines[i] = unidecode(line)
    return lines
Liranbz commented 4 years ago

Hi @chrisvaughn, Thank you, I tried your solution, and its working on 2 short samples. On the other hand, its not 100% success, maybe because its Esperanto and not Hebrew engine? thank you

chrisvaughn commented 4 years ago

Yes. It's true that for your text and audio, you may get better luck with using a Hebrew engine. I offered it as a suggestion in case it worked for you and was easier the solving the custom tts issue you were having. I hope you are able to find success one way or another.

Liranbz commented 4 years ago

I have custom TTS engine, based on Windows 10 Engine..I just need to know how to connect it to aeneas library..

readbeyond commented 4 years ago

@Liranbz in aeneas/extra you can find examples of custom TTS wrappers: https://github.com/readbeyond/aeneas/tree/master/aeneas/extra . Note that you probably need to write a small python wrapper on top of your ptts.vbs script.

Liranbz commented 4 years ago

Hi @readbeyond , Thank you for your answer. I tried to use the CTW as located in the /extra library. can you explain to me which ptts.vbs script I need to change?

Currently, I can use this cmd line in order to use Windows TTS engine for hebrew language: cscript "C:\Program Files\Jampal\ptts.vbs" < C:\Users\liran_bz\Desktop\read_file.txt By running this code the computer start to speak.

Another issue-this script can read only ANSI type of file, and not UTF-8.

I tried to use it with "hard-coded" on the CTW script but it's not working. can you help me with this issue? Thank you!

readbeyond commented 3 years ago

@Liranbz unfortunately I cannot provide specific help, as I do not have a Windows machine. You can probably try copying and modifying https://github.com/readbeyond/aeneas/blob/master/aeneas/extra/ctw_espeak.py

Please also read the README in https://github.com/readbeyond/aeneas/tree/master/aeneas/extra