open-mmlab / Amphion

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
https://openhlt.github.io/amphion/
MIT License
7.19k stars 534 forks source link

RuntimeError: failed to load voice "ja" #323

Open zachysaur opened 3 days ago

zachysaur commented 3 days ago

(venv) F:\maskgct\maskgct>python app.py ./models/tts/maskgct/g2p\sources\g2p_chinese_model\poly_bert_model.onnx Error: Could not load the specified mbrola voice file. Error: Could not load the specified mbrola voice file. Traceback (most recent call last): File "F:\maskgct\maskgct\app.py", line 20, in from models.tts.maskgct.g2p.g2p_generation import g2p, chn_eng_g2p File "F:\maskgct\maskgct\models\tts\maskgct\g2p\g2p_generation.py", line 10, in from models.tts.maskgct.g2p.utils.g2p import phonemizer_g2p File "F:\maskgct\maskgct\models\tts\maskgct\g2p\utils\g2p.py", line 30, in phonemizer_ja = EspeakBackend( File "F:\maskgct\maskgct\venv\lib\site-packages\phonemizer\backend\espeak\espeak.py", line 49, in init self._espeak.set_voice(language) File "F:\maskgct\maskgct\venv\lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 249, in set_voice raise RuntimeError( # pragma: nocover RuntimeError: failed to load voice "ja"

(venv) F:\maskgct\maskgct>

yuantuo666 commented 2 days ago

Hi, the MaskGCT is built in a Linux environment. For a better coding experience, it is recommended that Linux be used to reproduce.

For people who are having problems configuring the env on a Windows machine, you can try to follow this blog post: https://www.cnblogs.com/v3ucn/p/18511187

zelenooki87 commented 2 days ago

@zachysaur I had the same issue on Windows. problem solved by: replacing phonemizer files from this fixed commit https://github.com/bootphon/phonemizer/tree/b2db56adceef42b9a20c8ffb4d49868f630b88a1/phonemizer

After that if you got character unicode error, just turn on UTF-8 (BETA) language for non-unicode programs in regional and language settings

SNAG-0000

If you get mbrola dlls error, put those two files from zip to: mbrola.zip

C:\Program Files (x86)\eSpeak\command_line

It should now work.

zachysaur commented 1 day ago

Hi, the MaskGCT is built in a Linux environment. For a better coding experience, it is recommended that Linux be used to reproduce.

For people who are having problems configuring the env on a Windows machine, you can try to follow this blog post: https://www.cnblogs.com/v3ucn/p/18511187

still same error even after following everything on this blog ./models/tts/maskgct/g2p\sources\g2p_chinese_model\poly_bert_model.onnx 2024-11-03 08:38:00.9068680 [E:onnxruntime:Default, provider_bridge_ort.cc:1862 onnxruntime::TryGetProviderInfo_CUDA] D:\a_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1539 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "F:\gct\Amphion\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

2024-11-03 08:38:00.9208389 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:993 onnxruntime::python::CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9. and CUDA 12., and the latest MSVC runtime. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. Error: Could not load the specified mbrola voice file. Error: Could not load the specified mbrola voice file. Traceback (most recent call last): File "F:\gct\Amphion\1.py", line 1, in from models.tts.maskgct.maskgct_utils import * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\gct\Amphion\models\tts\maskgct\maskgct_utils.py", line 20, in from models.tts.maskgct.g2p.g2p_generation import g2p, chn_eng_g2p File "F:\gct\Amphion\models\tts\maskgct\g2p\g2p_generation.py", line 10, in from models.tts.maskgct.g2p.utils.g2p import phonemizer_g2p File "F:\gct\Amphion\models\tts\maskgct\g2p\utils\g2p.py", line 30, in phonemizer_ja = EspeakBackend( ^^^^^^^^^^^^^^ File "F:\gct\Amphion\venv\Lib\site-packages\phonemizer\backend\espeak\espeak.py", line 49, in init self._espeak.set_voice(language) File "F:\gct\Amphion\venv\Lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 249, in set_voice raise RuntimeError( # pragma: nocover RuntimeError: failed to load voice "ja"

zelenooki87 commented 1 day ago

Hi, the MaskGCT is built in a Linux environment. For a better coding experience, it is recommended that Linux be used to reproduce. For people who are having problems configuring the env on a Windows machine, you can try to follow this blog post: https://www.cnblogs.com/v3ucn/p/18511187

still same error even after following everything on this blog ./models/tts/maskgct/g2p\sources\g2p_chinese_model\poly_bert_model.onnx 2024-11-03 08:38:00.9068680 [E:onnxruntime:Default, provider_bridge_ort.cc:1862 onnxruntime::TryGetProviderInfo_CUDA] D:\a_work\1\s\onnxruntime\core\session\provider_bridge_ort.cc:1539 onnxruntime::ProviderLibrary::Get [ONNXRuntimeError] : 1 : FAIL : LoadLibrary failed with error 126 "" when trying to load "F:\gct\Amphion\venv\Lib\site-packages\onnxruntime\capi\onnxruntime_providers_cuda.dll"

2024-11-03 08:38:00.9208389 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:993 onnxruntime::python::CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Require cuDNN 9. and CUDA 12., and the latest MSVC runtime. Please install all dependencies as mentioned in the GPU requirements page (https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements), make sure they're in the PATH, and that your GPU is supported. Error: Could not load the specified mbrola voice file. Error: Could not load the specified mbrola voice file. Traceback (most recent call last): File "F:\gct\Amphion\1.py", line 1, in from models.tts.maskgct.maskgct_utils import * ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "F:\gct\Amphion\models\tts\maskgct\maskgct_utils.py", line 20, in from models.tts.maskgct.g2p.g2p_generation import g2p, chn_eng_g2p File "F:\gct\Amphion\models\tts\maskgct\g2p\g2p_generation.py", line 10, in from models.tts.maskgct.g2p.utils.g2p import phonemizer_g2p File "F:\gct\Amphion\models\tts\maskgct\g2p\utils\g2p.py", line 30, in phonemizer_ja = EspeakBackend( ^^^^^^^^^^^^^^ File "F:\gct\Amphion\venv\Lib\site-packages\phonemizer\backend\espeak\espeak.py", line 49, in init self._espeak.set_voice(language) File "F:\gct\Amphion\venv\Lib\site-packages\phonemizer\backend\espeak\wrapper.py", line 249, in set_voice raise RuntimeError( # pragma: nocover RuntimeError: failed to load voice "ja"

You could try this repo. It worked for me correctly on Windows. https://github.com/justinjohn0306/MaskGCT-Windows Error message says you dont have Cuda 12.x and Cudnn (+zlib.dll) and msvc build tools in path. If so, you can install onnxruntime (by default). Or if you install all cuda dependencies properly and place it to path variable you could install onnxruntime-gpu for faster inference. Uninstall pytorch and reinstall for GPU. Thats all