w-okada / voice-changer

リアルタイムボイスチェンジャー Realtime Voice Changer
Other
15.27k stars 1.64k forks source link

[REQUEST]: Use more CPU and GPU resources to speed up voice conversion. #1260

Open haksal732 opened 2 days ago

haksal732 commented 2 days ago

In a few words, describe your idea

Use more CPU and GPU resources to speed up voice conversion.

More information

1 23

When using voice changer, use more CPU and GPU resources to speed up voice conversion.

KuukoShan commented 2 days ago

The voice conversion time is not only limited by hardware but by the chunk size. This is hard to change because the chunk size is the amount of audio that you send to process. The bigger the chunk size, the more audio time that you are sending. And the more audio that you are sending, the more accurate can be the conversion as it contains more info. For the software is hard to guess how to convert properly if the info sent is minimal. That's why it doesn't makes more usage of your hardware, it simply doesn't needs to. And I don't see this changing. Regardless of the hardware capabilities, it will always need enough audio time to process it properly. The only way to reduce the time needed to convert the voice without issues and without sounding robotic would be using a different approach and letting AI to "guess" how it should sound even lacking such info.

That said, you can lower the chunk size to reduce the delay. But like I mentioned before, if you send too little data, the AI won't be able to predict how it should sound properly so phonemes may terminate unexpectedly or sound off or robotic.