Bad performances with Japanese

Hello,

I'm trying to align a Japanese audiobook with the text. My setup:

Linux (Ubuntu 20.04)
Python 3.8.10
aeneas 1.7.3.0
eSpeak NG 1.50
MBROLA 3.02b

The alignment is very bad, even when I use a short extract of a few sentences.

I run aeneas like so:

python3 -m aeneas.tools.execute_task \
    "extract.mp3" \
    "extract.txt" \
    "task_language=JA|os_task_file_format=aud|is_text_type=plain" -r="tts=espeak-ng|allow_unlisted_languages=True" \
    "extract_markers.txt" \
    --output-html

The text has one sentence per line and looks like this:

僕の「自己理解メソッド」を説明する前に、まずは「やりたいこと」探しに関する迷信を解いておきます。
ここで紹介する「5つの間違い」を持ったまま「やりたいこと探し」を進めても、やりたいことは見つかりません。
けれどこの間違いにハマってしまっている人が実に多いのです。
...

I also tried converting it into kana (ぼくの, ボクノ) or phonemes ([[b o k u ]][[n o ]]), but it does not improve.

I'm not sure about this, but maybe it could be related to an issue with espeak-ng, see my comment here: https://github.com/espeak-ng/espeak-ng/issues/566#issuecomment-880100908

Any help would be greatly appreciated, thank you !

readbeyond / aeneas

Bad performances with Japanese #276