Open eehoeskrap opened 3 months ago
Could you tell us how to get the input for bert from texts?
Are there any C++ implementation for that?
In this code, you can get the bert value through the get_bert function. Bert calls a different torch model for each language, and there is only a Python implementation. https://github.com/myshell-ai/MeloTTS/blob/144a0980fac43411153209cf08a1998e3c161e10/melo/utils.py#L22
In your code, there is a part where bert and ja_bert are entered as model inputs in ModelWrapper. https://github.com/k2-fsa/sherpa-onnx/blob/963aaba82b01a425ae8dcf0fdcff6b073a45686f/scripts/melo-tts/export-onnx.py#L172
So, even though I specified input_names as below when exporting to the onnx model, I am experiencing the phenomenon that there is no bert in the input in the onnx file.
torch.onnx.export(
torch_model,
(
x,
x_lengths,
sid,
tones,
lang_id,
bert,
ja_bert,
sdp_ratio,
noise_scale,
noise_scale_w,
length_scale,
),
filename,
opset_version=opset_version,
input_names=[
"x",
"x_lengths",
"sid",
"tones",
"lang_id",
"bert",
"ja_bert",
"sdp_ratio",
"noise_scale",
"noise_scale_w",
"length_scale",
],
output_names=["y"],
dynamic_axes={
"x": {0: "N", 1: "L"},
"x_lengths": {0: "N"},
"tones": {0: "N", 1: "L"},
"lang_id": {0: "N", 1: "L"},
"bert": {0: "N", 1: "L", 2: "D"},
"ja_bert": {0: "N", 1: "L", 2: "D"},
"y": {0: "N", 1: "S", 2: "T"},
},
)
Could you tell us how to get the input for bert from texts?
Are there any C++ implementation for that?
Please have a look at this comment. That is the main obstacle. If you can fix it, then we can support bert.
In this code, you can get the bert value through the get_bert function.
Yes, I know that. I am asking do you know if there is a C++ implementation for that or is it possible to implement it in C++?
In this code, you can get the bert value through the get_bert function.
Yes, I know that. I am asking do you know if there is a C++ implementation for that or is it possible to implement it in C++?
As far as I know, there is currently no Korean version of Bert C++. I will try it and let you know.
By the way, the main issue is about the tokenizer.
By the way, the main issue is about the tokenizer.
Yes, I know that. If you run onnx with the bert value set to 0 like this code, the Korean voice is produced awkwardly.
If you run onnx with the bert value set to 0 like this code, the Korean voice is produced awkwardly.
In that case, supporting Korean models from MeloTTS in sherpa-onnx may be hard.
Could you try https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-mimic3-ko_KO-kss_low.tar.bz2
We have already had a Korean TTS model in sherpa-onnx.
If you run onnx with the bert value set to 0 like this code, the Korean voice is produced awkwardly.
In that case, supporting Korean models from MeloTTS in sherpa-onnx may be hard.
Could you try https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/vits-mimic3-ko_KO-kss_low.tar.bz2
We have already had a Korean TTS model in sherpa-onnx.
I found this repo while trying to export MeloTTS models ONNX. When exporting ONNX in this code, I was wondering why bert was not included. Thanks to your answer, I found out that it is because there is no C++ implementation.
I already have a Korean tts model trained with custom data. I just succeeded in exporting onnx including bert values. However, the preprocessing process (tokenizer, etc.) was run in python.
The Korean version of MeloTTS torch model is exported to ONNX for inference, so it is quite fast. However, I need to try a C++ implementation of the preprocessing process like you did. I will try this. However, Korean phoneme processing is quite difficult.
As you mentioned earlier, the biggest question is "How do we implement the bert torch model in C++?" is correct. First, let's try exporting the bert model to onnx.
Thank you for the reply.
Currently the ios version has to process the entire text before synthesizing the audio,
I just added the support for passing a callback from Swift to C. Please see #1218
Please play the samples received in the callback by yourself, possibly in a separate thread. We don't have time to add that.
Finally, also noticed ios version can't be published to app store due to framework issue.
Please have a look at https://github.com/k2-fsa/sherpa-onnx/issues/1172
By the way, contributions to sherpa-onnx are highly appreciated.
Hope that you can fix the issues by yourself.
@nanaghartey
@csukuangfj No problem. I actually made some contributions but noticed the latest version fixes most of the issues i found. Example in sherpa-onnx/jni/jni.cc
some reserved words in java were used preventing porting of sample tts kotlin code to java. E.g Java_com_k2fsa_sherpa_onnx_SpeakerEmbeddingExtractor_new
Now all is good!
By the way, I just checked out MeloTTS, finetuned a model and exported to sherpa onnx for android. It's great. How can i help bring this to ios? I'm not sure the swiftui tts example accepts melo tts models
How can i help bring this to ios? I'm not sure the swiftui tts example accepts melo tts models
Yes, it is already supported. In case you don't know how to do it, I just added an example for you. Please see https://github.com/k2-fsa/sherpa-onnx/pull/1223
@nanaghartey
@csukuangfj I have a single speaker fine tuned model (melo). it works great but when i convert to sherpa onnx and then use the provided zh_en .fst and .dict on android , i get wrong synthesis. I assumed it would work since my model is english. how can i generate the .fst and .dict files for my custom model? or can we make it work by changing the configurations?
You don't need *.fst for English only models.
Could you post the code about how you add the metadata?
, i get wrong synthesis.
Could you be more specific? What does wrong
mean?
@csukuangfj thanks for the prompt response.
"wrong" here means unexpected output. wrong pronunciations.
Sorry but this is how i export (the default export script only exports chinese_english):
import torch
from melo.api import TTS
from melo.text import language_id_map, language_tone_start_map
from melo.text.chinese import pinyin_to_symbol_map
from melo.text.english import eng_dict, refine_syllables
from pypinyin import Style, lazy_pinyin, phrases_dict, pinyin_dict
from typing import Any, Dict
import json
# Prepare the pinyin to symbol map
for k, v in pinyin_to_symbol_map.items():
if isinstance(v, list):
break
pinyin_to_symbol_map[k] = v.split()
# Function to get initial, final, and tone from pinyin
def get_initial_final_tone(word: str):
initials = lazy_pinyin(word, neutral_tone_with_five=True, style=Style.INITIALS)
finals = lazy_pinyin(word, neutral_tone_with_five=True, style=Style.FINALS_TONE3)
ans_phone = []
ans_tone = []
for c, v in zip(initials, finals):
raw_pinyin = c + v
v_without_tone = v[:-1]
try:
tone = v[-1]
except:
return [], []
pinyin = c + v_without_tone
if c:
v_rep_map = {
"uei": "ui",
"iou": "iu",
"uen": "un",
}
if v_without_tone in v_rep_map.keys():
pinyin = c + v_rep_map[v_without_tone]
else:
pinyin_rep_map = {
"ing": "ying",
"i": "yi",
"in": "yin",
"u": "wu",
}
if pinyin in pinyin_rep_map.keys():
pinyin = pinyin_rep_map[pinyin]
else:
single_rep_map = {
"v": "yu",
"e": "e",
"i": "y",
"u": "w",
}
if pinyin[0] in single_rep_map.keys():
pinyin = single_rep_map[pinyin[0]] + pinyin[1:]
if pinyin not in pinyin_to_symbol_map:
continue
phone = pinyin_to_symbol_map[pinyin]
ans_phone += phone
ans_tone += [tone] * len(phone)
return ans_phone, ans_tone
# Function to generate tokens file
def generate_tokens(symbol_list):
with open("tokens.txt", "w", encoding="utf-8") as f:
for i, s in enumerate(symbol_list):
f.write(f"{s} {i}\n")
# Function to add new English words to the lexicon
def add_new_english_words(lexicon):
lexicon["kaldi"] = [["K", "AH0"], ["L", "D", "IH0"]]
lexicon["SF"] = [["EH1", "S"], ["EH1", "F"]]
# Function to generate lexicon file
def generate_lexicon():
word_dict = pinyin_dict.pinyin_dict
phrases = phrases_dict.phrases_dict
add_new_english_words(eng_dict)
with open("lexicon.txt", "w", encoding="utf-8") as f:
for word in eng_dict:
phones, tones = refine_syllables(eng_dict[word])
tones = [t + language_tone_start_map["EN"] for t in tones]
tones = [str(t) for t in tones]
phones = " ".join(phones)
tones = " ".join(tones)
f.write(f"{word.lower()} {phones} {tones}\n")
for key in word_dict:
if not (0x4E00 <= key <= 0x9FA5):
continue
w = chr(key)
phone, tone = get_initial_final_tone(w)
if not phone:
continue
phone = " ".join(phone)
tone = " ".join(tone)
f.write(f"{w} {phone} {tone}\n")
for w in phrases:
phone, tone = get_initial_final_tone(w)
if not phone:
continue
phone = " ".join(phone)
tone = " ".join(tone)
f.write(f"{w} {phone} {tone}\n")
# Function to add metadata to ONNX model
def add_meta_data(filename: str, meta_data: Dict[str, Any]):
import onnx
model = onnx.load(filename)
while len(model.metadata_props):
model.metadata_props.pop()
for key, value in meta_data.items():
meta = model.metadata_props.add()
meta.key = key
meta.value = str(value)
onnx.save(model, filename)
# ModelWrapper class definition
class ModelWrapper(torch.nn.Module):
def __init__(self, model: "SynthesizerTrn"):
super().__init__()
self.model = model
self.lang_id = language_id_map[model.language]
def forward(
self,
x,
x_lengths,
tones,
sid,
noise_scale,
length_scale,
noise_scale_w,
max_len=None,
):
bert = torch.zeros(x.shape[0], 1024, x.shape[1], dtype=torch.float32)
ja_bert = torch.zeros(x.shape[0], 768, x.shape[1], dtype=torch.float32)
lang_id = torch.zeros_like(x)
lang_id[:, 1::2] = self.lang_id
return self.model.model.infer(
x=x,
x_lengths=x_lengths,
sid=sid,
tone=tones,
language=lang_id,
bert=bert,
ja_bert=ja_bert,
noise_scale=noise_scale,
noise_scale_w=noise_scale_w,
length_scale=length_scale,
)[0]
# Main function to handle model loading and ONNX export
def main():
generate_lexicon() # Generate the lexicon.txt file
model_path = "model.pth" # Path to your custom model
config_path = "config.json" # Path to your config.json file
with open(config_path, 'r') as f:
config = json.load(f)
model = TTS(language="EN", device="cpu", config_path=config_path, ckpt_path=model_path)
model.load_state_dict(torch.load(model_path, map_location="cpu"), strict=False)
generate_tokens(config["symbols"]) # Generate tokens.txt file
torch_model = ModelWrapper(model)
x = torch.randint(low=0, high=10, size=(60,), dtype=torch.int64)
x_lengths = torch.tensor([x.size(0)], dtype=torch.int64)
sid = torch.tensor([0], dtype=torch.int64)
tones = torch.zeros_like(x)
noise_scale = torch.tensor([1.0], dtype=torch.float32)
length_scale = torch.tensor([1.0], dtype=torch.float32)
noise_scale_w = torch.tensor([1.0], dtype=torch.float32)
x = x.unsqueeze(0)
tones = tones.unsqueeze(0)
filename = "model.onnx"
torch.onnx.export(
torch_model,
(x, x_lengths, tones, sid, noise_scale, length_scale, noise_scale_w),
filename,
opset_version=13,
input_names=["x", "x_lengths", "tones", "sid", "noise_scale", "length_scale", "noise_scale_w"],
output_names=["y"],
dynamic_axes={
"x": {0: "N", 1: "L"},
"x_lengths": {0: "N"},
"tones": {0: "N", 1: "L"},
"y": {0: "N", 1: "S", 2: "T"},
},
)
meta_data = {
"model_type": "melo-vits",
"comment": "melo",
"version": 2,
"language": "English",
"add_blank": int(config["data"]["add_blank"]),
"n_speakers": config["data"]["n_speakers"],
"jieba": 1,
"sample_rate": config["data"]["sampling_rate"],
"bert_dim": 1024,
"ja_bert_dim": 768,
"speaker_id": list(config["data"]["spk2id"].values())[0],
"lang_id": language_id_map["EN"],
"tone_start": language_tone_start_map["EN"],
"url": "https://github.com/myshell-ai/MeloTTS",
"license": "MIT license",
"description": "MeloTTS is a high-quality multi-lingual text-to-speech library by MyShell.ai",
}
add_meta_data(filename, meta_data)
if __name__ == "__main__":
main()
then in api.py i do:
class TTS(nn.Module):
def __init__(self,
language,
device='auto',
use_hf=True,
config_path=None,
ckpt_path=None):
super().__init__()
if device == 'auto':
device = 'cpu'
if torch.cuda.is_available():
device = 'cuda'
if torch.backends.mps.is_available():
device = 'mps'
if 'cuda' in device:
assert torch.cuda.is_available()
# Load configuration from your custom config_path
if config_path:
hps = utils.get_hparams_from_file(config_path)
else:
hps = load_or_download_config(language, use_hf=use_hf)
num_languages = hps.num_languages
num_tones = hps.num_tones
symbols = hps.symbols
model = SynthesizerTrn(
len(symbols),
hps.data.filter_length // 2 + 1,
hps.train.segment_size // hps.data.hop_length,
n_speakers=hps.data.n_speakers,
num_tones=num_tones,
num_languages=num_languages,
**hps.model,
).to(device)
model.eval()
self.model = model
self.symbol_to_id = {s: i for i, s in enumerate(symbols)}
self.hps = hps
self.device = device
# load state_dict
checkpoint_dict = load_or_download_model(language, device, use_hf=use_hf, ckpt_path=ckpt_path)
self.model.load_state_dict(checkpoint_dict['model'], strict=True)
language = language.split('_')[0]
self.language = 'ZH_MIX_EN' if language == 'ZH' else language
"wrong" here means unexpected output. wrong pronunciations.
Could you post some text
and the corresponding generated wav?
please also post the logs if you use sherpa-onnx
to generate the wav with your model.
https://github.com/csukuangfj/onnxruntime-build/actions/runs/9184634501
You can see from the above link that we can successfully build a debug version of static lib.
"wrong" here means unexpected output. wrong pronunciations.
Could you post some
text
and the corresponding generated wav?please also post the logs if you use
sherpa-onnx
to generate the wav with your model.
custom model 1 : Eng, news (african accent)
text - "things to look out for in the year 2020"
.pth generated wav -
https://github.com/user-attachments/assets/b6ca93ad-c38c-412c-8c6e-45e8b6e28a84
onnx generated wav -
https://github.com/user-attachments/assets/6dead35d-4ced-4883-827c-2b7cda9941fc
custom model 2 - Eng, singing (us accent)
text - "next time won't you sing with me"
.pth generated wav -
https://github.com/user-attachments/assets/4ce3f2be-a7ea-404d-be90-f4e80d712ab3
onnx generated wav -
https://github.com/user-attachments/assets/d7b0ce43-dca6-48ad-b2b6-33dc28a5ef31
i use sherpa-onnx but don't get logs. I was only trying out Melo on sherpa so models were not trained for long (training is not the issue though)
I hope you're able to spot the issue. Thanks
@csukuangfj I can also share my model.pth and config.json files if that'd help.
When you use .pth
to test your model, can you zero out the bert part and try again?
When you use
.pth
to test your model, can you zero out the bert part and try again?
The results is still better than onnx's when i zero out the bert part
Could you show the code about how you did that?
In api.py #def tts_to_file()
i did:
bert = torch.zeros_like(bert).to(device)
please share your solution if that is wrong.
could you please post the complete code?
could you please post the complete code?
def tts_to_file(self, text, speaker_id, output_path=None, sdp_ratio=0.2, noise_scale=0.6, noise_scale_w=0.8, speed=1.0, pbar=None, format=None, position=None, quiet=False,):
language = self.language
texts = self.split_sentences_into_pieces(text, language, quiet)
audio_list = []
if pbar:
tx = pbar(texts)
else:
if position:
tx = tqdm(texts, position=position)
elif quiet:
tx = texts
else:
tx = tqdm(texts)
for t in tx:
if language in ['EN', 'ZH_MIX_EN']:
t = re.sub(r'([a-z])([A-Z])', r'\1 \2', t)
device = self.device
bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
#bert = torch.zeros_like(bert).to(device)
#ja_bert = torch.zeros_like(ja_bert).to(device)
with torch.no_grad():
x_tst = phones.to(device).unsqueeze(0)
tones = tones.to(device).unsqueeze(0)
lang_ids = lang_ids.to(device).unsqueeze(0)
bert = bert.to(device).unsqueeze(0)
ja_bert = ja_bert.to(device).unsqueeze(0)
x_tst_lengths = torch.LongTensor([phones.size(0)]).to(device)
del phones
speakers = torch.LongTensor([speaker_id]).to(device)
audio = self.model.infer(
x_tst,
x_tst_lengths,
speakers,
tones,
lang_ids,
bert,
ja_bert,
sdp_ratio=sdp_ratio,
noise_scale=noise_scale,
noise_scale_w=noise_scale_w,
length_scale=1. / speed,
)[0][0, 0].data.cpu().float().numpy()
del x_tst, tones, lang_ids, bert, ja_bert, x_tst_lengths, speakers
#
audio_list.append(audio)
torch.cuda.empty_cache()
audio = self.audio_numpy_concat(audio_list, sr=self.hps.data.sampling_rate, speed=speed)
if output_path is None:
return audio
else:
if format:
soundfile.write(output_path, audio, self.hps.data.sampling_rate, format=format)
else:
soundfile.write(output_path, audio, self.hps.data.sampling_rate)
In api.py
#def tts_to_file()
i did:bert = torch.zeros_like(bert).to(device)
please share your solution if that is wrong.
Could you change
bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
#bert = torch.zeros_like(bert).to(device)
#ja_bert = torch.zeros_like(ja_bert).to(device)
to
bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
bert.zero_()
ja_bert.zero_()
@csukuangfj
bert, ja_bert, phones, tones, lang_ids = utils.get_text_for_tts_infer(t, language, self.hps, device, self.symbol_to_id)
bert.zero_()
ja_bert.zero_()
results is generated wav that sounds almost same as the original .pth inference (without zeroing out ) except for some few pronunciations that sound off. however it's way better than the wavs from onnx above. Here is the output with bert zeroed out:
https://github.com/user-attachments/assets/79b12910-318a-432d-8d08-0687d31e566b
https://github.com/user-attachments/assets/237d7511-a562-4674-b893-ddb2e5de54ea
I then tried :
bert = torch.zeros(x.shape[0], 1024, x.shape[1], dtype=torch.float32)
ja_bert = torch.zeros(x.shape[0], 768, x.shape[1], dtype=torch.float32)
bert.zero_()
ja_bert.zero_()
in export-onnx.py for onnx conversion but i got same "wrong" results shared earlier
Please compare the inputs to the model manually and see if they are the same.
Please compare the inputs to the model manually and see if they are the same.
my .pth has:
BERT input shape: torch.Size([1024, 71])
JA_BERT input shape: torch.Size([768, 71])
Phones input shape: torch.Size([71])
Tones input shape: torch.Size([71])
Language IDs shape: torch.Size([71])
What changes can i make to the onnx export script please, or any other way to get this EN model to work with sherpa onnx tts :( ?
https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/melo-tts/test.py
please use this script to test the onnx model.
By comparing the model inputs, I mean comparing the value of the inputs, including the shape.
https://github.com/k2-fsa/sherpa-onnx/blob/master/scripts/melo-tts/test.py
please use this script to test the onnx model.
By comparing the model inputs, I mean comparing the value of the inputs, including the shape.
Below is the output. What next step should i take please?
Dumping model to file cache /var/folders/vf/13g26rdn3673b6cxhlhy03yh0000gn/T/jieba.cache
Loading model cost 1.013 seconds.
Prefix dict has been built successfully.
这是
t 这是
w 这
这
w 是
是
一个
使用
t 使用
w 使
使
w 用
用
next
generation
kaldi
的
text
to
speech
中英文
t 中英文
w 中
中
w 英
英
w 文
文
例子
.
Thank
t Thank
w T
T
t T
w h
h
w a
a
w n
n
w k
k
you
!
你
觉得
如何
呢
?
are
you
ok
?
Fantastic
t Fantastic
w F
F
t F
w a
a
w n
n
w t
t
w a
a
w s
s
w t
t
w i
i
w c
c
!
How
t How
w H
H
t H
w o
o
w w
w
about
you
?
torch.Size([265]) torch.Size([265])
torch.Size([1, 265]) torch.Size([1, 265])
Please compare the input tensor values.
@csukuangfj I tried that but still didn't get good results. I even tried exporting the official Melo tts models on huggingface - https://huggingface.co/myshell-ai
At this point i think i may just have to continue using piper/coqui though it doesn't sound as good as MeloTTS. Thanks for all the support :)
By the way, the comparison is to help debugging.
通过sherpa-onnx gpu(1.10.17+cuda) 调用 vits-melo-tts-zh_en 报错请问是什么原因呢 (cpu可以) python3 ./python-api-examples/offline-tts-play.py --vits-model=./vits-melo-tts-zh_en/model.onnx
通过sherpa-onnx gpu(1.10.17+cuda) 调用 vits-melo-tts-zh_en 报错请问是什么原因呢 (cpu可以) python3 ./python-api-examples/offline-tts-play.py --vits-model=./vits-melo-tts-zh_en/model.onnx
目前解决不了这个问题。
通过sherpa-onnx gpu(1.10.17+cuda) 调用 vits-melo-tts-zh_en 报错请问是什么原因呢 (cpu可以) python3 ./python-api-examples/offline-tts-play.py --vits-model=./vits-melo-tts-zh_en/model.onnx
目前解决不了这个问题。
好的 感谢
do anyone have google collab notebook for this? convert models? i need japan tts voices
do anyone have google collab notebook for this? convert models? i need japan tts voices
Please see https://colab.research.google.com/drive/1XsKyAXti1e6_qYiJ3Fiyt8E7d1lPch75?usp=sharing
It is for Chinese+English MeloTTS model.
do anyone have google collab notebook for this? convert models? i need japan tts voices
Please see https://colab.research.google.com/drive/1XsKyAXti1e6_qYiJ3Fiyt8E7d1lPch75?usp=sharing
It is for Chinese+English MeloTTS model.
Is there one for English only? In future if there is a way to convert a standard English model from the official training script can you share here? Thanks
Sorry, I only have this one.
通过sherpa-onnx gpu(1.10.17+cuda) 调用 vits-melo-tts-zh_en 报错请问是什么原因呢 (cpu可以) python3 ./python-api-examples/offline-tts-play.py --vits-model=./vits-melo-tts-zh_en/model.onnx
请用 onnxruntime 1.12.0
微信群里,有同学反馈, 使用 onnxruntime 1.12.0, gpu 跑 melo tts, 不会有问题. @dhc45010
通过sherpa-onnx gpu(1.10.17+cuda) 调用 vits-melo-tts-zh_en 报错请问是什么原因呢 (cpu可以) python3 ./python-api-examples/offline-tts-play.py --vits-model=./vits-melo-tts-zh_en/model.onnx
请看
https://github.com/k2-fsa/sherpa-onnx/pull/1379
@dhc45010
通过sherpa-onnx gpu(1.10.17+cuda) 调用 vits-melo-tts-zh_en 报错请问是什么原因呢 (cpu可以) python3 ./python-api-examples/offline-tts-play.py --vits-model=./vits-melo-tts-zh_en/model.onnx
请看
1379
@dhc45010
好的 感谢大佬的回复,我回头试试
@csukuangfj any updates on getting the default MeloTTS models to work?
@csukuangfj any updates on getting the default MeloTTS models to work?
Could you describe the issue you have? @nanaghartey
@csukuangfj any updates on getting the default MeloTTS models to work?
Could you describe the issue you have?
@nanaghartey
There is support for Chinese+English MeloTTS model only . If one wants to use metlotts they have to stick to the Chinese+English model . I'm asking if there are any updates/documentation on converting e.g standard English MeloTTS models.
please adapt our current script. if you have any troubles, please post error logs.
please adapt our current script. if you have any troubles, please post error logs.
I already tried that above and it didn't work .
Thank you for creating a great repository. I wonder why there is no bert when converting a pytorch model of MeloTTS to an Onnx model. https://github.com/k2-fsa/sherpa-onnx/blob/963aaba82b01a425ae8dcf0fdcff6b073a45686f/scripts/melo-tts/export-onnx.py#L206C1-L235C6