myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
4.8k stars 620 forks source link

ONNX infer #164

Open pengpengtao opened 4 months ago

pengpengtao commented 4 months ago

Does this model support ONNX inference?

csukuangfj commented 4 months ago

Yes.

I suggest that you have a look at https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/melo-tts/export-onnx.py

We support exporting MeloTTS models to onnx.

You can use https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/melo-tts/test.py to test your exported onnx model.

Furthermore, we also provide a C++ runtime for it and support 10 programming languages.

You can try the exported MeloTTS Chinese+English ONNX model on your Android phone by downloading the APK from

https://k2-fsa.github.io/sherpa/onnx/tts/apk-engine.html

sherpa-onnx-1.10.16-armeabi-v7a-zh_en-tts-engine-vits-melo-tts-zh_en.apk

eehoeskrap commented 3 months ago

@csukuangfj Good news! Are there any plans for a Korean Onnx version?

csukuangfj commented 3 months ago

You can have a look at how we convert the Chinese+English model. The way for converting Korean should be similar.

csukuangfj commented 3 months ago

We already have a Korean tts model at https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-mimic3-ko_KO-kss_low.tar.bz2 So we don't plan to convert the Korean model from MeloTTS soon.

eehoeskrap commented 3 months ago

@csukuangfj I will analyze the Chinese + English version and attempt to perform inference on the Korean ONNX version. Thank you!

m-bain commented 3 months ago

@eehoeskrap any success?

khacpv commented 2 months ago

@csukuangfj I tried to convert JP language by edit export-onnx.py#L225 from ZH to JP but not successful.

Could you help me check? Thank you.

output file: test.wav seem not working.

The build logs as following:

~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
torch.Size([60])
/tmp/MeloTTS/melo/attentions.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  t_s == t_t
/tmp/MeloTTS/melo/attentions.py:340: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  pad_length = max(length - (self.window_size + 1), 0)
/tmp/MeloTTS/melo/attentions.py:341: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  slice_start_position = max((self.window_size + 1) - length, 0)
/tmp/MeloTTS/melo/attentions.py:343: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_length > 0:
/tmp/MeloTTS/melo/transforms.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if torch.min(inputs) < left or torch.max(inputs) > right:
/tmp/MeloTTS/melo/transforms.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min_bin_width * num_bins > 1.0:
/tmp/MeloTTS/melo/transforms.py:121: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min_bin_height * num_bins > 1.0:
/tmp/MeloTTS/melo/transforms.py:171: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert (discriminant >= 0).all()
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/symbolic_opset10.py:531: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  return g.op("Constant", value_t=torch.tensor(list_or_value))
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
  _C._jit_pass_onnx_graph_shape_type_inference(
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/utils.py:1208: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
  _C._jit_pass_onnx_graph_shape_type_inference(
csukuangfj commented 2 months ago

@csukuangfj I tried to convert JP language by edit export-onnx.py#L225 from ZH to JP but not successful.

Could you help me check? Thank you.

output file: test.wav seem not working.

The build logs as following:

~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/nn/utils/weight_norm.py:28: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.
  warnings.warn("torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.")
torch.Size([60])
/tmp/MeloTTS/melo/attentions.py:278: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  t_s == t_t
/tmp/MeloTTS/melo/attentions.py:340: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  pad_length = max(length - (self.window_size + 1), 0)
/tmp/MeloTTS/melo/attentions.py:341: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  slice_start_position = max((self.window_size + 1) - length, 0)
/tmp/MeloTTS/melo/attentions.py:343: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if pad_length > 0:
/tmp/MeloTTS/melo/transforms.py:114: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if torch.min(inputs) < left or torch.max(inputs) > right:
/tmp/MeloTTS/melo/transforms.py:119: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min_bin_width * num_bins > 1.0:
/tmp/MeloTTS/melo/transforms.py:121: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if min_bin_height * num_bins > 1.0:
/tmp/MeloTTS/melo/transforms.py:171: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  assert (discriminant >= 0).all()
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/_internal/jit_utils.py:307: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
  _C._jit_pass_onnx_node_shape_type_inference(node, params_dict, opset_version)
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/symbolic_opset10.py:531: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  return g.op("Constant", value_t=torch.tensor(list_or_value))
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/utils.py:702: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
  _C._jit_pass_onnx_graph_shape_type_inference(
~/user/.pyenv/versions/3.10.11/lib/python3.10/site-packages/torch/onnx/utils.py:1208: UserWarning: Constant folding - Only steps=1 can be constant folded for opset >= 10 onnx::Slice op. Constant folding not applied. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/torch/csrc/jit/passes/onnx/constant_fold.cpp:181.)
  _C._jit_pass_onnx_graph_shape_type_inference(

Sorry, the info you give is toooo limited.

csukuangfj commented 2 months ago

Please see https://github.com/csukuangfj/sherpa-onnx/actions/runs/10537349904/job/29198651417#step:4:936

-rw-r--r-- 1 runner docker 163M Aug 24 09:13 model.onnx

You can see that by changing to JP you can successfully convert the model to onnx.

(You need to handle tokens.txt and lexicon.txt for Japanese).

khacpv commented 2 months ago

@csukuangfj I can run and get model.onnx as following:

➜  melo-tts git:(main) ✗ ls -la 
total 347920
drwxr-xr-x  12 phamkhac  staff        384 Aug 24 14:59 .
drwxr-xr-x  27 phamkhac  staff        864 Aug 22 09:02 ..
-rw-r--r--@  1 phamkhac  staff       6148 Aug 24 14:59 .DS_Store
-rw-r--r--   1 phamkhac  staff        156 Aug 21 09:34 README.md
-rwxr-xr-x   1 phamkhac  staff       8731 Aug 24 15:13 export-onnx.py
-rw-r--r--@  1 phamkhac  staff    6837671 Aug 24 15:20 lexicon.txt
-rw-r--r--   1 phamkhac  staff  170604200 Aug 24 15:21 model.onnx
-rwxr-xr-x   1 phamkhac  staff        614 Aug 24 14:50 run.sh
-rwxr-xr-x   1 phamkhac  staff       1637 Aug 21 09:34 show-info.py
-rwxr-xr-x   1 phamkhac  staff       5196 Aug 24 15:18 test.py
-rw-r--r--   1 phamkhac  staff      70700 Aug 24 15:28 test.wav
-rw-r--r--@  1 phamkhac  staff       1440 Aug 24 15:20 tokens.txt

You need to handle tokens.txt and lexicon.txt for Japanese

Do you have any instruction for handle this files?

csukuangfj commented 2 months ago

You have to figure them out by yourself. We have already provided you an example for Chinese+English.

bensonbs commented 1 month ago

In the current implementation of MeloTTS, specifically in the export-onnx.py script, the BERT embeddings are set to a zero array. This approach potentially reduces the realism of the generated audio.

# Current implementation
bert = torch.zeros(x.shape[0], 1024, x.shape[1], dtype=torch.float32)

This simplification may be affecting the quality and naturalness of the synthesized speech.

Is there a way to integrate BERT embeddings with the ONNX model to improve the realism of the generated audio?

csukuangfj commented 1 month ago

the main difficulty is the tokenizer part.