Closed Mikey-Mikey closed 2 months ago
also could you try just talking instead of singing?
because my issue kinda goes away when yelling / singing.
Are you using the model I provided earlier? In my environment, it seems to be converting quite well.
What F0 estimator are you using? I recommend rmvpe_onnx.
output.mp4
Hello there!
I am having the same issue. I have included a zip file where you can compare how it sounds. The source was a WAV file recorded by myself to make sure that the comparison was made properly. I have tried to match the settings between v1 and v2. The distortion between ONNX file and PTH file happens even v1 but very slightly. The distortion on v2 is extreme to the point that is unusable (why does ONNX have lower quality than PTH models?). I have converted the PTH file into ONNX using v1 and used the same files across all the versions. I have made 6 tests with 3 different software versions including v1, and with 2 types of files for the same model (PTH and ONNX). I hope this can be helpful.
The name of the files contains the necessary information. For example "v1.5.3.18a [pth] (rmvpe 192 - 131072).wav"
The first part is the version. Between brackets [] is the file type for the voice model. Between parenthesis is the F0 detection used, the chunk size, and the extra size.
I am using an Nvidia 3080Ti GPU and i7 13700K CPU Kuuko audio test.zip
Sorry, in my environment and with my hearing, I can't tell the difference.
https://github.com/w-okada/voice-changer/assets/48346627/5bdd1ce3-082c-4924-ad49-db9eb27560b5
https://github.com/w-okada/voice-changer/assets/48346627/6dcb8ca9-8ff6-44dd-9312-3bd152eabbf0
I will try to make an audio example more challenging for the software. But it's really weird if you can't notice the difference in the files that I sent. My onnx file in v2 sounds like a broken Vocaloid lol. It's very noticeable when I speak a bit louder. It does not happens in v1 with the same wav input file.
It's there any chances that onnx in v2 is broken on my side? Because when I load a onnx model, I get some yellow messages in the console. I will try to take a screenshot. Those yellow messages does not appear when loading pth files but I need to do more testing first.
Sorry, in my environment and with my hearing, I can't tell the difference.
myouou_pth.rmvpe_onnx.24000.16320.mp4 myouou_onnx.rmvpe_onnx.24000.16320.mp4
Interesting that you're not getting the autotune issue but I am.
I was unable to reproduce the issue clearly. If possible, could you share the audio data that causes the problem?
The issue happens when you have a high extra. And the higher the Extra amount is the worse it gets. Also I'm now using 2.0.40 alpha and there's some loud popping noise which I think is due to the console saying NaN on the out side of the vol section.
the nan issue only happens when the decibels get really low or silent.
Here's the nan issue.
I'm gonna test the autotune issue real quick on my new 1000 epoch male_07 model that uses the TITAN-Medium pretrain.
Nope still having an autotune issue. It happens when you ramp your own voice's pitch up when saying something like "hello"
@Mikey-Mikey
The issue happens when you have a high extra. And the higher the Extra amount is the worse it gets. Also I'm now using 2.0.40 alpha and there's some loud popping noise which I think is due to the console saying NaN on the out side of the vol section.
Did this issue not occur in versions prior to v 2.0.40? Starting from v 2.0.40, we made the volume adjustment method a bit more aggressive, which might have caused this issue.
Only the Extra autotune issue has happened in previous versions, The popping and NaN issue has so far only been in 2.0.40.
I have this issue not only with my model (which was a PTH file converted to ONNX) but also with the included ones. I include two files for easy comparison. One from v1 and another from v2. This time I used Amitaro voice where the differences are more clear. I am using similar settings for both but v2 have a bit bigger chunk size which should result in higher quality anyway (but it's clearly not). You can notice how v2 missbehave when speaking a bit louder or with long vowels getting a monotone and robotic voice. Both tests has been performed using an audio file to make sure that the input is the same.
I want to add a special note. Even though v1 works fine with ONNX files and rmvpe_onnx, PTH files still sounds better with minimal chances of getting robotic voice, specially if it's also used with rmvpe instead of rmvpe_onnx. Sadly, PTH models don't work on 2.0.40 as stated on a different opened issue.
@Mikey-Mikey We have identified areas that might be causing the deterioration in audio quality. These issues have been corrected, so please try the new version (v.2.0.44-alpha).
Yep I was just about to tell you that the newest version is working.
Yay.
I should probably close this issue now. Everything that I've had an issue with is fixed now.
Everything has been fixed. So I'm closing this issue.
@Mikey-Mikey @KuukoShan Thank you for your persistent cooperation in improving quality.
@Mikey-Mikey @KuukoShan Thank you for your persistent cooperation in improving quality.
Wr should be the ones thankful for your hard work on this software ( ̄▽ ̄)
Voice Changer Version
vcclient_win_cuda_2.0.24-alpha.zip
Operational System
Windows 10
GPU
RTX 2060
CUDA Version
CUDA 12.5.78
Read carefully and check the options
Model Type
RVC
Issue Description
My issue right now is it's just sounding terrible and is repeating itself over and over and its really loud. There's no error logs or any logs period. I've tried switching the output off and it still repeats over and over and sounds really bad. I'm also using the exact same model I've trained myself that I used on v1.
Application Screenshot
Logs on console
None.