Closed lottev1991 closed 5 months ago
We have done some experiments on the parameter, but no observable difference was found between the default threshold and your proposed value.
Perhaps we should collect more information on this issue. For example, which PE are you using, or which PEs have you tried? Will different PEs matter on this? Currently in our Chinese comminity most people including us use RMVPE, and there is yet no evidence to indicate that the threshold (or tension itself) can affect the quality. I hope you (and other people, as well) can provide more experimental results before we determine whether to modified the settings, and how.
Changing a parameter is not an easy thing. For example, if there are not many cases to support the change, we would rather make it a user-defined configuration than hard-encoding it; if the influence is wide and significant, then we can consider changing it directly in the code; otherwise, the default value tuned by the library author should still be preferred.
A better harmonic-noise separation algorithm is instroduced in #196, together with an important bug fix for array padding when the recording is longer than the label.
The issue described in this PR can be simply bypassed by using the new algorithm. Feel free to raise a new issue if new problems occur.
Hello all,
Recently, users from the DiffSinger community have been experimenting with lowering the threshold of the D4C waveform decompositioning step as found in
binarizer_utils.py
. The default setting for this is quite high, which can cause the following issues in models using variance parameters (tension and voicing in particular):I've set the current threshold value at 0.25; there have been suggestions from the community to put an even lower value, though I have not tested that myself. The above-mentioned value has already significantly improved the quality of my latest model, which does support the tension parameter. This improvement in quality so far seems to be consistent across the board, with multiple positive reports from users so far. This is why I think it's a good idea that a lower threshold becomes the new default during waveform decomposition.
Initial findings were done by @UtaUtaUtau, who had this to say about it:
Regards,
Lotte V