myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
4.73k stars 608 forks source link

Is it possible to export to onnx? #98

Closed AngelGuevara7 closed 3 months ago

AngelGuevara7 commented 6 months ago

Hello, I just discovered onnx format and its advantages in speed. Has anyone tried to export MeloTTS to onnx format?

jeremy110 commented 6 months ago

That is possible, maybe you can refer this https://github.com/fishaudio/Bert-VITS2/blob/master/export_onnx.py https://github.com/fishaudio/Bert-VITS2/blob/master/onnx_infer.py

walletiger commented 5 months ago

what's the branch or commit id reversion from bert-vits2 ? i check the models has much difference . eg : TextEncoder ..

jeremy110 commented 5 months ago

@walletiger The basic architectures are mostly the same, but Melotts has more languages and uses IPA. The latest BERT-VITS2 has added WAVLM and emotion to the basic architecture. Earlier versions (before 2.0.0) seem to be quite similar.

csukuangfj commented 3 months ago

Please have a look at https://github.com/myshell-ai/MeloTTS/issues/164

You can also run the exported ONNX models on android, ios, raspberry pi, etc, using C++.

pengpengtao commented 3 months ago

请看一下 #164

您还可以使用 C++ 在 android、ios、raspberry pi 等上运行导出的 ONNX 模型。

我已经导出为onnx了,但是我想根据sid合成不同音色,配置文件里有"n_speakers": 256, "spk2id": { "ZH": 1 }这样的字段,我猜测是不是有256个不同的发言人,还是n_speakers的维度为256,所以我修改n_speakers的值,音色好像没有变化

pengpengtao commented 3 months ago

我看 image 这个类里面并没有对speak_id操作,但是源代码里面却能根据修改音色,一般来说男女的音色差距比较,它是怎么区分男女音色的呢

csukuangfj commented 3 months ago

中英文模型,只有一个 speaker, 并且,它的 speaker _id, 固定为1.

其他模型,我没试过。

pengpengtao commented 3 months ago

中英文模型,只有一个 speaker, 并且,它的 speaker _id, 固定为1.

其他模型,我没试过。

好的,我试试其他模型,

csukuangfj commented 3 months ago

如果 speaker id 给错了,生成的音频,没有声音。我调试了很久,才发现这个问题。

希望你能避免这个问题

pengpengtao commented 3 months ago

如果 speaker id 给错了,生成的音频,没有声音。我调试了很久,才发现这个问题。

希望你能避免这个问题

刚刚已经尝试过了,添加sid,无论什么sid都没有声音,所以我就比较疑惑了,它不跟据sid输出音色么,再怎么也要两个id才行吧,匹配男女的音色,难道音色提取器特别强大,无论男女的音色都能完全解耦。

csukuangfj commented 3 months ago

无论什么sid都没有声音

你试过几个 sid, 就得出了这个结论?

csukuangfj commented 3 months ago

总有一个 sid 会有声音的,你是穷举了 1 到 1<<31 这么多个么? 没有的话,你试过从0到10 么?

AngelGuevara7 commented 3 months ago

Please have a look at #164

You can also run the exported ONNX models on android, ios, raspberry pi, etc, using C++.

Thanks for your solution!! I tested it with my custom models and it worked perfectly! :) I noticed a 30 MB reduction in model size(from 190MB to 160MB aprox), but the inference speed is almost the same. Did you compare the inference speed between pytorch model and onnx model?

pengpengtao commented 3 months ago

无论什么sid都没有声音

你试过几个 sid, 就得出了这个结论?

我试了255,200,的sid,那如果都要试的话,也是可以,遍历生成就好,一个一个听一下

csukuangfj commented 3 months ago

@pengpengtao

https://github.com/myshell-ai/MeloTTS/blob/144a0980fac43411153209cf08a1998e3c161e10/melo/app.py#L33

需要参考这个,需要用

models[language].hps.data.spk2id

里面的某一个数字。

csukuangfj commented 3 months ago

Please have a look at #164 You can also run the exported ONNX models on android, ios, raspberry pi, etc, using C++.

Thanks for your solution!! I tested it with my custom models and it worked perfectly! :) I noticed a 30 MB reduction in model size(from 190MB to 160MB aprox), but the inference speed is almost the same. Did you compare the inference speed between pytorch model and onnx model?

It's great to hear that it works for you.

Did you compare the inference speed between pytorch model and onnx model?

Unfortunately, we have not done that.

AngelGuevara7 commented 3 months ago

Please have a look at #164

You can also run the exported ONNX models on android, ios, raspberry pi, etc, using C++.

I'll close the issue because this comment solve it. Feel free to reopen it.

nanaghartey commented 2 months ago

Please have a look at #164 You can also run the exported ONNX models on android, ios, raspberry pi, etc, using C++.

Thanks for your solution!! I tested it with my custom models and it worked perfectly! :) I noticed a 30 MB reduction in model size(from 190MB to 160MB aprox), but the inference speed is almost the same. Did you compare the inference speed between pytorch model and onnx model?

Can you share your script for converting to ONNX? i also tried converting my english custom model but got some awkward pronunciations