Open pengpengtao opened 4 months ago
Yes.
I suggest that you have a look at https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/melo-tts/export-onnx.py
We support exporting MeloTTS models to onnx.
You can use https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/melo-tts/test.py to test your exported onnx model.
Furthermore, we also provide a C++ runtime for it and support 10 programming languages.
You can try the exported MeloTTS Chinese+English ONNX model on your Android phone by downloading the APK from
https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html
sherpa-onnx-1.10.16-armeabi-v7a-zh_en-tts-engine-vits-melo-tts-zh_en.apk
@csukuangfj Good news! Are there any plans for a Korean Onnx version?
You can have a look at how we convert the Chinese+English model. The way for converting Korean should be similar.
We already have a Korean tts model at https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-mimic3-ko_KO-kss_low.tar.bz2 So we don't plan to convert the Korean model from MeloTTS soon.
@csukuangfj I will analyze the Chinese + English version and attempt to perform inference on the Korean ONNX version. Thank you!
@eehoeskrap any success?
@csukuangfj
I tried to convert JP language by edit export-onnx.py#L225
from ZH to JP but not successful.
Could you help me check? Thank you.
output file: test.wav
seem not working.
The build logs as following:
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
torch.Size([60])
/tmp/MeloTTS/melo/attentions.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
t_s == t_t
/tmp/MeloTTS/melo/attentions.py:340: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
pad_length = max(length - (self.window_size + 1), 0)
/tmp/MeloTTS/melo/attentions.py:341: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
slice_start_position = max((self.window_size + 1) - length, 0)
/tmp/MeloTTS/melo/attentions.py:343: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if pad_length > 0:
/tmp/MeloTTS/melo/transforms.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if torch.min(inputs) < left or torch.max(inputs) > right:
/tmp/MeloTTS/melo/transforms.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if min_bin_width * num_bins > 1.0:
/tmp/MeloTTS/melo/transforms.py:121: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if min_bin_height * num_bins > 1.0:
/tmp/MeloTTS/melo/transforms.py:171: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
assert (discriminant >= 0).all()
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
_C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/symbolic_opset10.py:531: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
return g.op("Constant", value_t=torch.tensor(list_or_value))
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/utils.py:1208: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
_C._jit_pass_onnx_graph_shape_type_inference(
@csukuangfj I tried to convert JP language by edit
export-onnx.py#L225
from ZH to JP but not successful.Could you help me check? Thank you.
output file:
test.wav
seem not working.The build logs as following:
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm. warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.") torch.Size([60]) /tmp/MeloTTS/melo/attentions.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! t_s == t_t /tmp/MeloTTS/melo/attentions.py:340: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! pad_length = max(length - (self.window_size + 1), 0) /tmp/MeloTTS/melo/attentions.py:341: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! slice_start_position = max((self.window_size + 1) - length, 0) /tmp/MeloTTS/melo/attentions.py:343: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if pad_length > 0: /tmp/MeloTTS/melo/transforms.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if torch.min(inputs) < left or torch.max(inputs) > right: /tmp/MeloTTS/melo/transforms.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if min_bin_width * num_bins > 1.0: /tmp/MeloTTS/melo/transforms.py:121: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if min_bin_height * num_bins > 1.0: /tmp/MeloTTS/melo/transforms.py:171: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! assert (discriminant >= 0).all() ~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.) _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version) ~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/symbolic_opset10.py:531: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor). return g.op("Constant", value_t=torch.tensor(list_or_value)) ~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.) _C._jit_pass_onnx_graph_shape_type_inference( ~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/utils.py:1208: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.) _C._jit_pass_onnx_graph_shape_type_inference(
Sorry, the info you give is toooo limited.
Please see https://github.com/csukuangfj/sherpa-onnx/actions/runs/10537349904/job/29198651417#step:4:936
-rw-r--r-- 1 runner docker 163M Aug 24 09:13 model.onnx
You can see that by changing to JP
you can successfully convert the model to onnx.
(You need to handle tokens.txt and lexicon.txt for Japanese).
@csukuangfj I can run and get model.onnx as following:
➜ melo-tts git:(main) ✗ ls -la
total 347920
drwxr-xr-x 12 phamkhac staff 384 Aug 24 14:59 .
drwxr-xr-x 27 phamkhac staff 864 Aug 22 09:02 ..
-rw-r--r--@ 1 phamkhac staff 6148 Aug 24 14:59 .DS_Store
-rw-r--r-- 1 phamkhac staff 156 Aug 21 09:34 README.md
-rwxr-xr-x 1 phamkhac staff 8731 Aug 24 15:13 export-onnx.py
-rw-r--r--@ 1 phamkhac staff 6837671 Aug 24 15:20 lexicon.txt
-rw-r--r-- 1 phamkhac staff 170604200 Aug 24 15:21 model.onnx
-rwxr-xr-x 1 phamkhac staff 614 Aug 24 14:50 run.sh
-rwxr-xr-x 1 phamkhac staff 1637 Aug 21 09:34 show-info.py
-rwxr-xr-x 1 phamkhac staff 5196 Aug 24 15:18 test.py
-rw-r--r-- 1 phamkhac staff 70700 Aug 24 15:28 test.wav
-rw-r--r--@ 1 phamkhac staff 1440 Aug 24 15:20 tokens.txt
You need to handle tokens.txt and lexicon.txt for Japanese
Do you have any instruction for handle this files?
You have to figure them out by yourself. We have already provided you an example for Chinese+English.
In the current implementation of MeloTTS, specifically in the export-onnx.py script, the BERT embeddings are set to a zero array. This approach potentially reduces the realism of the generated audio.
# Current implementation
bert = torch.zeros(x.shape[0], 1024, x.shape[1], dtype=torch.float32)
This simplification may be affecting the quality and naturalness of the synthesized speech.
Is there a way to integrate BERT embeddings with the ONNX model to improve the realism of the generated audio?
the main difficulty is the tokenizer part.
Does this model support ONNX inference?