rhasspy / piper

A fast, local neural text to speech system
https://rhasspy.github.io/piper-samples/
MIT License
6.11k stars 446 forks source link

How to convert the vits model to onnx? #9

Closed cgisky1980 closed 1 year ago

cgisky1980 commented 1 year ago

or Chinese support thx

synesthesiam commented 1 year ago

The export_onnx.py script in src/python/larynx_train will do this. Any suggestions on a good Chinese TTS dataset?

cgisky1980 commented 1 year ago

https://huggingface.co/bert-base-chinese

cgisky1980 commented 1 year ago

https://www.aishelltech.com/aishell_3

cgisky1980 commented 1 year ago

https://drive.google.com/file/d/1Blg8OD_YvFrpyrYys7OoXM4fcarVRLft/view?usp=sharing

synesthesiam commented 1 year ago

https://drive.google.com/file/d/1Blg8OD_YvFrpyrYys7OoXM4fcarVRLft/view?usp=sharing

Thank you. Do you happen to know the license of this dataset, and how the phonetic sounds were produced? I usually use espeak-ng to get phonemes for my voices, but comparing its output to the sounds in transcripts.txt shows a lot of differences.

cgisky1980 commented 1 year ago

https://drive.google.com/file/d/1Blg8OD_YvFrpyrYys7OoXM4fcarVRLft/view?usp=sharing

Thank you. Do you happen to know the license of this dataset, and how the phonetic sounds were produced? I usually use espeak-ng to get phonemes for my voices, but comparing its output to the sounds in transcripts.txt shows a lot of differences.

this is from https://github.com/PlayVoice/HuaYan_TTS

cgisky1980 commented 1 year ago

https://drive.google.com/file/d/1Blg8OD_YvFrpyrYys7OoXM4fcarVRLft/view?usp=sharing

Thank you. Do you happen to know the license of this dataset, and how the phonetic sounds were produced? I usually use espeak-ng to get phonemes for my voices, but comparing its output to the sounds in transcripts.txt shows a lot of differences.

try espeak

tuannvhust commented 1 year ago

@synesthesiam Dear sir, I successfully convert my model to ONNX model, but I got trouble. If use the input which is the same size as the dummy input, everything is ok. But if I use different input, we got the error, which is InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Non-zero status code returned while running Expand node. Name:'Expand_2865' Status Message: invalid expand shape Do you have any idea how to solve it?

synesthesiam commented 1 year ago

Did you mark the text, text length, and output axes as dynamic?

tuannvhust commented 1 year ago

@synesthesiam yes i did, somebody got that problem too. Do you experience the same issue?

synesthesiam commented 1 year ago

I haven't experienced that issue, no. What version of PyTorch are you running?

tuannvhust commented 1 year ago

I use Pytorch 1.8.1 sir

cgisky1980 commented 1 year ago

https://drive.google.com/file/d/1Blg8OD_YvFrpyrYys7OoXM4fcarVRLft/view?usp=sharing

Thank you. Do you happen to know the license of this dataset, and how the phonetic sounds were produced? I usually use espeak-ng to get phonemes for my voices, but comparing its output to the sounds in transcripts.txt shows a lot of differences.

espeak-ng get phonemes for chinese voices ,copy the file “zh_listx” from dir “dictsource/extra” to “dictsource” and run cmd “espeak-ng --compile=cmn”
after that,you can use “espeak-ng.exe -X -v cmn -f test.txt “ get phonemes for chinese voices

synesthesiam commented 1 year ago

@cgisky1980 Is it OK to use cmn_listx? https://github.com/espeak-ng/espeak-ng/tree/master/dictsource/extra

cgisky1980 commented 1 year ago

@cgisky1980 Is it OK to use cmn_listx? https://github.com/espeak-ng/espeak-ng/tree/master/dictsource/extra

yes old version is zh_listx, now is cmn_listx

synesthesiam commented 1 year ago

Thanks! Model is training now.

cgisky1980 commented 1 year ago

great! love u

synesthesiam commented 1 year ago

Got a small model trained: https://github.com/rhasspy/larynx2/releases/download/v0.0.2/voice-zh-cn_huayan.tar.gz I will train a larger model later :+1:

cgisky1980 commented 1 year ago

Got a small model trained: https://github.com/rhasspy/larynx2/releases/download/v0.0.2/voice-zh-cn_huayan.tar.gz I will train a larger model later 👍

and espeak-ng-data must update :)

cgisky1980 commented 1 year ago

/www/src/larynx$ ./larynx --help

usage: ./larynx [options]

options: -h --help show this message and exit -m FILE --model FILE path to onnx model file -c FILE --config FILE path to model config file (default: model path + .json) -f FILE --output_file FILE path to output WAV file ('-' for stdout) -d DIR --output_dir DIR path to output directory (default: cwd) -s NUM --speaker NUM id of speaker (default: 0) --noise-scale NUM generator noise (default: 0.667) --length-scale NUM phoneme length (default: 1.0) --noise-w NUM phonene width noise (default: 0.8)

How to specify the language?

cgisky1980 commented 1 year ago

I try ./larynx --model zh-cn-huayan-low.onnx --output_file 1.wav -c zh-cn-huayan-low.onnx.json But the wav seems to be wrong.

cgisky1980 commented 1 year ago

Got a small model trained: https://github.com/rhasspy/larynx2/releases/download/v0.0.2/voice-zh-cn_huayan.tar.gz I will train a larger model later 👍

o! i see, “libespeak1-ng.so.1.1.51” This file must be updated. compiler with flag --with-extdict-cmn

Okay, okay, now it is working.

synesthesiam commented 1 year ago

Thanks, I'll add --with-extdict-cmn to the build script and upload a new version.

cgisky1980 commented 1 year ago

The export_onnx.py script in src/python/larynx_train will do this. Any suggestions on a good Chinese TTS dataset?

The export_onnx.py script is cover .ckpt to onnx ,but i have a .pth file by torch . I will send a email to you.

wxqwinner commented 1 year ago

@synesthesiam

Got a small model trained: https://github.com/rhasspy/larynx2/releases/download/v0.0.2/voice-zh-cn_huayan.tar.gz I will train a larger model later +1

' a larger model later', is there any progress?

synesthesiam commented 1 year ago

Sure, I added one here: https://github.com/rhasspy/piper/releases/download/v0.0.2/voice-zh_CN-huayan-medium.tar.gz