Closed TeaPoly closed 2 years ago
Here is PR https://github.com/jzi040941/PercepNet/pull/43. It may need to be verified for correctness.
@TeaPoly Good idea. It looks like an normalization operation similarly, it is of benefit to quantization definitely, whereas, the dynamic range of pitch-corr is scaled to 0-1, which may not be good for model performance. From my perspective, the dynamic range of pitch-corr can represent the accuracy of the estimated pitch, in a sense, the model can learn the differences from the (large range)pitch-corr and estimated pitch pair. However, i am not sure about this.
I found FP16 training is hard to converge because pitch correlation result is too large .
https://github.com/jzi040941/PercepNet/blob/29d041bafb3ec53b432271ef653bb90eea6f22c1/src/pitch.cpp#L385
But in Jean-Marc Valin's paper A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet is between 0 and 1.
I think it can be improvement.