[ISSUE for v2]: Sounding terrible.

Mikey-Mikey commented 3 months ago

Voice Changer Version

vcclient_win_cuda_2.0.24-alpha.zip

Operational System

Windows 10

GPU

RTX 2060

CUDA Version

CUDA 12.5.78

Read carefully and check the options

[ ] If you use win_cuda_torch_cuda edition, setup cuda? see here
[X] If you use win_cuda edition, setup cuda and cudnn? see here
[ ] If you use mac edition, client is not launched automatically. Use chrome to open application.?
[X] I've tried to change the Chunk Size
[X] I've read the tutorial
[X] I've tried to extract to another folder (or re-extract) the .zip file

Model Type

RVC

Issue Description

My issue right now is it's just sounding terrible and is repeating itself over and over and its really loud. There's no error logs or any logs period. I've tried switching the output off and it still repeats over and over and sounds really bad. I'm also using the exact same model I've trained myself that I used on v1.

Application Screenshot

Logs on console

None.

Mikey-Mikey commented 2 months ago

also could you try just talking instead of singing?

Mikey-Mikey commented 2 months ago

because my issue kinda goes away when yelling / singing.

Kuuko-fokkusugaru commented 2 months ago

Are you using the model I provided earlier? In my environment, it seems to be converting quite well.

What F0 estimator are you using? I recommend rmvpe_onnx.

output.mp4

Hello there!

I am having the same issue. I have included a zip file where you can compare how it sounds. The source was a WAV file recorded by myself to make sure that the comparison was made properly. I have tried to match the settings between v1 and v2. The distortion between ONNX file and PTH file happens even v1 but very slightly. The distortion on v2 is extreme to the point that is unusable (why does ONNX have lower quality than PTH models?). I have converted the PTH file into ONNX using v1 and used the same files across all the versions. I have made 6 tests with 3 different software versions including v1, and with 2 types of files for the same model (PTH and ONNX). I hope this can be helpful.

The name of the files contains the necessary information. For example "v1.5.3.18a [pth] (rmvpe 192 - 131072).wav"

The first part is the version. Between brackets [] is the file type for the voice model. Between parenthesis is the F0 detection used, the chunk size, and the extra size.

I am using an Nvidia 3080Ti GPU and i7 13700K CPU Kuuko audio test.zip

w-okada commented 2 months ago

Sorry, in my environment and with my hearing, I can't tell the difference.

https://github.com/w-okada/voice-changer/assets/48346627/5bdd1ce3-082c-4924-ad49-db9eb27560b5

https://github.com/w-okada/voice-changer/assets/48346627/6dcb8ca9-8ff6-44dd-9312-3bd152eabbf0

Kuuko-fokkusugaru commented 2 months ago

I will try to make an audio example more challenging for the software. But it's really weird if you can't notice the difference in the files that I sent. My onnx file in v2 sounds like a broken Vocaloid lol. It's very noticeable when I speak a bit louder. It does not happens in v1 with the same wav input file.

It's there any chances that onnx in v2 is broken on my side? Because when I load a onnx model, I get some yellow messages in the console. I will try to take a screenshot. Those yellow messages does not appear when loading pth files but I need to do more testing first.

Mikey-Mikey commented 2 months ago

Sorry, in my environment and with my hearing, I can't tell the difference.

myouou_pth.rmvpe_onnx.24000.16320.mp4 myouou_onnx.rmvpe_onnx.24000.16320.mp4

Interesting that you're not getting the autotune issue but I am.

w-okada commented 2 months ago

I was unable to reproduce the issue clearly. If possible, could you share the audio data that causes the problem?

Mikey-Mikey commented 2 months ago

The issue happens when you have a high extra. And the higher the Extra amount is the worse it gets. Also I'm now using 2.0.40 alpha and there's some loud popping noise which I think is due to the console saying NaN on the out side of the vol section.

Mikey-Mikey commented 2 months ago

the nan issue only happens when the decibels get really low or silent.

Mikey-Mikey commented 2 months ago

Here's the nan issue.

Mikey-Mikey commented 2 months ago

I'm gonna test the autotune issue real quick on my new 1000 epoch male_07 model that uses the TITAN-Medium pretrain.

Mikey-Mikey commented 2 months ago

Nope still having an autotune issue. It happens when you ramp your own voice's pitch up when saying something like "hello"

w-okada commented 2 months ago

@Mikey-Mikey

The issue happens when you have a high extra. And the higher the Extra amount is the worse it gets. Also I'm now using 2.0.40 alpha and there's some loud popping noise which I think is due to the console saying NaN on the out side of the vol section.

Did this issue not occur in versions prior to v 2.0.40? Starting from v 2.0.40, we made the volume adjustment method a bit more aggressive, which might have caused this issue.

https://github.com/w-okada/voice-changer/issues/1266

Mikey-Mikey commented 2 months ago

Only the Extra autotune issue has happened in previous versions, The popping and NaN issue has so far only been in 2.0.40.

Kuuko-fokkusugaru commented 2 months ago

I have this issue not only with my model (which was a PTH file converted to ONNX) but also with the included ones. I include two files for easy comparison. One from v1 and another from v2. This time I used Amitaro voice where the differences are more clear. I am using similar settings for both but v2 have a bit bigger chunk size which should result in higher quality anyway (but it's clearly not). You can notice how v2 missbehave when speaking a bit louder or with long vowels getting a monotone and robotic voice. Both tests has been performed using an audio file to make sure that the input is the same.

I want to add a special note. Even though v1 works fine with ONNX files and rmvpe_onnx, PTH files still sounds better with minimal chances of getting robotic voice, specially if it's also used with rmvpe instead of rmvpe_onnx. Sadly, PTH models don't work on 2.0.40 as stated on a different opened issue.

Amitaro Kuuko test.zip

w-okada commented 2 months ago

@Mikey-Mikey We have identified areas that might be causing the deterioration in audio quality. These issues have been corrected, so please try the new version (v.2.0.44-alpha).

Mikey-Mikey commented 2 months ago

Yep I was just about to tell you that the newest version is working.

Mikey-Mikey commented 2 months ago

Yay.

Mikey-Mikey commented 2 months ago

I should probably close this issue now. Everything that I've had an issue with is fixed now.

Mikey-Mikey commented 2 months ago

Everything has been fixed. So I'm closing this issue.

w-okada commented 2 months ago

@Mikey-Mikey @KuukoShan Thank you for your persistent cooperation in improving quality.

Kuuko-fokkusugaru commented 2 months ago

@Mikey-Mikey @KuukoShan Thank you for your persistent cooperation in improving quality.

Wr should be the ones thankful for your hard work on this software (￣▽￣)

w-okada / voice-changer