maum-ai / nuwave2

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates @ INTERSPEECH 2022
https://mindslab-ai.github.io/nuwave2
BSD 3-Clause "New" or "Revised" License
271 stars 21 forks source link

Infer_step selection #18

Open yxlu-0102 opened 10 months ago

yxlu-0102 commented 10 months ago

I have been using your open-source code to perform 16k to 48k speech reconstruction. I utilized the default 8-step inference process and tested it on the untrimmed test set using your provided checkpoint.

However, I've encountered some issues with the reconstructed speech quality. Specifically, there appears to be a significant amount of noise in the high-frequency components of the reconstructed speech. The SNR I obtained is 19.472, and the LSD is 1.212. In contrast, the results in the research paper show SNR as 24.0 and LSD as 0.92.

I suspect that the issue might be related to the inadequacy of the inference steps. Therefore, I would like to understand how to better configure the infer_steps and infer_schedule to improve the quality of the reconstructed speech. Could you please provide guidance on how to adjust these parameters to get closer to the results mentioned in the research paper?

jjunak-yun commented 2 months ago

Hello, @yxlu-0102 !

When I executed the 8-step inference process like you did, I noticed significant noise in the high-frequency range of the reconstructed speech. How many steps of the inference process did you perform to eliminate the noise or to achieve results similar to those in the paper?

I sincerely appreciate your help and hope that your advice will lead to better results.

Have a good day :)

yxlu-0102 commented 2 months ago

Hello, @yxlu-0102 !

When I executed the 8-step inference process like you did, I noticed significant noise in the high-frequency range of the reconstructed speech. How many steps of the inference process did you perform to eliminate the noise or to achieve results similar to those in the paper?

I sincerely appreciate your help and hope that your advice will lead to better results.

Have a good day :)

Hi,

I used the checkpoint and inference codes of nu-wave2 provided by the author of the UDM+ in their repository. The performance was much better.

jjunak-yun commented 2 months ago

Hello, @yxlu-0102 !

Thank you so much for your quick and valuable response. I believe that by experimenting with the checkpoint you provided, I might achieve better results. 😊

Did you still set the inference steps to 8 with the new checkpoint?

In my case, when I set the inference steps to 8 with the new checkpoint, there is noise. When I increase the inference steps to over 50, the sound quality improves, but the LSD value exceeds 2, indicating a tendency towards excessive denoising. 🥲

I'm curious to know the number of inference steps you used with the new checkpoint!

Thank you once again for your response. Have a great day. :)

yxlu-0102 commented 2 months ago

Hi @naknak-Yun,

I set the inference step to 50, and below are the results I reproduced:

截屏2024-07-11 10 27 21
jjunak-yun commented 2 months ago

Hi @yxlu-0102 ,

Thank you so much for your kind and quick response. Your answer has been very helpful for my research. Have a great day. 👍👍