[ISSUE for v2]: Voice breaking and accent not being picked up

asr-aditya commented 1 month ago

Voice Changer Version

vccclient_win_cuda_2.0.65-beta.zip

Operational System

Windows server 2022

GPU

Tesla T4

CUDA Version

12.6

Read carefully and check the options

[X] If you use win_cuda_torch_cuda edition, setup cuda? see here
[ ] If you use win_cuda edition, setup cuda and cudnn? see here
[ ] If you use mac edition, client is not launched automatically. Use chrome to open application.?
[X] I've tried to change the Chunk Size
[X] I've tried to set the Index to zero
[X] I've read the tutorial
[X] I've tried to extract to another folder (or re-extract) the .zip file

Does pre-installed model work?

YES

Model Type

RVC

Issue Description

I am trying my voice to Hollywood movie characters. I have trained voice clips of 30 mins on RVC using Mangio-RVC. Using those trained models, when I use model inference by uploading audio file there and all other hugging face implementation I am able to get good results. But I am unable to get same results using the realtime rvc client using this project. Am I missing something that needs to be configured to get same results in realtime using RVC Client?

Application Screenshot

Many documents and issues suggested that maxing out the index works. But in my case, it did not showed any improvement. How can I check k if index files are even working or not? Screenshot 2024-10-20 at 11 51 33 PM

Logs on console

I am not sure if logs would help here in this case. Working wise everything is working fine. It is just the output quality I need to improve. But do let me know if any logs will be helpful. I will update them here.

Kuuko-fokkusugaru commented 1 month ago

It's hard to know what you mean exactly when you say that it doesn't sounds the same without some examples to compare.

Either way, I can see that your chunk and extra values are quite low which may result in worse quality. You can set extra to 5 seconds or higher and chunk to 0.3 - 0.5 seconds. The quality will change a lot depending on the F0 detection used. Dio it's quite lightwave and made for CPU, try rmvpe which gives better results and uses the GPU.

w-okada commented 2 weeks ago

no response.

w-okada / voice-changer