readbeyond / aeneas

aeneas is a Python/C library and a set of tools to automagically synchronize audio and text (aka forced alignment)
http://www.readbeyond.it/aeneas/
GNU Affero General Public License v3.0
2.49k stars 228 forks source link

Bad performances with Japanese #276

Closed itsupera closed 3 years ago

itsupera commented 3 years ago

Hello,

I'm trying to align a Japanese audiobook with the text. My setup:

The alignment is very bad, even when I use a short extract of a few sentences.

I run aeneas like so:

python3 -m aeneas.tools.execute_task \
    "extract.mp3" \
    "extract.txt" \
    "task_language=JA|os_task_file_format=aud|is_text_type=plain" -r="tts=espeak-ng|allow_unlisted_languages=True" \
    "extract_markers.txt" \
    --output-html

The text has one sentence per line and looks like this:

僕の「自己理解メソッド」を説明する前に、まずは「やりたいこと」探しに関する迷信を解いておきます。
ここで紹介する「5つの間違い」を持ったまま「やりたいこと探し」を進めても、やりたいことは見つかりません。
けれどこの間違いにハマってしまっている人が実に多いのです。
...

I also tried converting it into kana (ぼくの, ボクノ) or phonemes ([[b o k u ]][[n o ]]), but it does not improve.

I'm not sure about this, but maybe it could be related to an issue with espeak-ng, see my comment here: https://github.com/espeak-ng/espeak-ng/issues/566#issuecomment-880100908

Any help would be greatly appreciated, thank you !

itsupera commented 3 years ago

The cause for this bad performance was indeed espeak-ng, because after switching the TTS engine to aws everything worked perfectly.