How to use Custom Voice Models

harry0703 / MoneyPrinterTurbo

利用AI大模型，一键生成高清短视频 Generate short videos with one click using AI LLM.

MIT License

16.18k stars 2.56k forks source link

How to use Custom Voice Models #136

Open feijoes opened 5 months ago

feijoes commented 5 months ago

Is there any way to use my custom voice model , for example any from https://voice-models.com/ ?

harry0703 commented 5 months ago

cool are there any APIs?

feijoes commented 5 months ago

@harry0703 I'm not entirely sure, but I believe it would be beneficial to implement the feature where users can specify a link for the location of the dowloaded zip file containing the custom voice model. This would allow for easier integration and usage of personalized models.

feijoes commented 5 months ago

Maybe you can use https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI or others similars projects to convert the voice from edge-tts with the user custom model , If you think it's a good idea, I can fork the repo and work on it myself.

feijoes commented 5 months ago

hey @harry0703 I'm working on implementing the custom model feature. Do you have a Discord/email or any other means of communication for updates or questions? Or should I just address them here in this issue?

harry0703 commented 5 months ago

Thank you very much. We can communicate and exchange ideas directly here. I will check and reply in a timely manner.

9xcoder commented 5 months ago

Maybe you can use https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI or others similars projects to convert the voice from edge-tts with the user custom model , If you think it's a good idea, I can fork the repo and work on it myself.

Can you introduce some projects or simple ways to use pre-trained models from the above website?

feijoes commented 5 months ago

@9xcoder What you mean ? Here is a tutorial of how to use rvc correctly https://www.youtube.com/watch?v=hSxTLCR_95Y

feijoes commented 5 months ago

Hey @harry0703, here's an update on the issue. I've added the submodule https://github.com/skshadan/TTS-RVC-API to this project and removed all unnecessary parts. For now, it's working as an external API. My plan is to integrate it, not just as an external API on another port, but to integrate it with others. Additionally, I'm planning to add a button in the GUI interface that allows users to input the link of the external model, download it, extract it into the correct folder, and then select it as a speaker. However, I've encountered some issues with this. With the inclusion of this project, the project's requirements have changed significantly because all the audio conversion processes with a custom model are now done by the computer without any API. For example, the Dockerfile build now takes 30 minutes on my PC, even after removing all unnecessary files. Perhaps it's a good idea for this to be an optional feature of the project.

harry0703 commented 5 months ago

@feijoes This is a great idea, as some users have similar needs, and it can be flexibly configured as an optional feature. Thank you for your contribution, really looking forward to it.

9xcoder commented 5 months ago

@feijoes Sorry for my poor English skills. Actually, I want this project (TTS-RCV-API) to be able to integrate into my other project. Thank you for the YouTube link, it provides me with a method to train the model.

feijoes commented 5 months ago

@9xcoder Sure, follow the readme of the project and this should work , maybe you would like to remove some of the unnecessary files, but this project has all the minimum requriments to run it, if you need any help , please let me know

saleham5 commented 3 months ago

@feijoes Hey I opened an issue about using other TTS, but no one responded so far. I wanted to ask you whether you know how to implement Openvoice v2. I tried but it doesn't seem so easy. You seem like you have experience though so maybe you could help.

feijoes commented 3 months ago

@saleham5 Hi, I haven't used Openvoice v2 before, but I think it's very similar to what I was trying to achieve. Due to work and college, I didn't have time to finish the implementation. However, you can check out my fork and see the changes I made. Currently, this project wasn't designed to accept other TTS, so I modified it a bit. It's incomplete, but if you need any help, let me know.