Open yxlu-0102 opened 11 months ago
Thank you for your interest in our work. Please could you elaborate on your reproduction process, including...
How you calculated the LSD and other metrics? Did you use the method we provided or another library or software?
You mentioned using the "last 8 speakers", which doesn't seem to match the test set we used. Could you please elaborate on your test set partitioning method?
If possible, could you provide your reproduction results, including file names and scores?
I used the metric_calculator you provided but I changed the n_fft to 2048 for a fair comparison with other systems.
The systems you compared with in your paper (e.g., NU-wave2 and UDM+) used the VCTK-0.92 as the dataset, and their test set contains the last 8 speakers, so I used the same test set for a fair comparison.
For example, for the 24kHz to 48kHz experiment, the metrics I calculated are LSD of 0.72 and SNR of 25.86. Your metrics in the paper are LSD of 0.61 and SNR of 26.26.
Hello, does this mode support real-time voice super-resolution. Split the long speech into multiple 16ms for processing and merge them at the output end
I synthesise waveforms with your official ckpt on the test set of the VCTK-Corpus-0.92, which contains the audio clips of the last 8 speakers.
I calculated the LSD and SNR scores between the generated and reference test set, but the calculated metrics are not as good as those in your paper.
Additionally, the lsd calculation in
util.util.compute_metrics
seems strange, the n_fft should be 2048 while your default setting is 1024.