pitch correlation is too large

jzi040941 / PercepNet

Unofficial implementation of PercepNet: A Perceptually-Motivated Approach for Low-Complexity, Real-Time Enhancement of Fullband Speech

BSD 3-Clause "New" or "Revised" License

333 stars 92 forks source link

pitch correlation is too large #42

Closed TeaPoly closed 2 years ago

TeaPoly commented 2 years ago

I found FP16 training is hard to converge because pitch correlation result is too large .

https://github.com/jzi040941/PercepNet/blob/29d041bafb3ec53b432271ef653bb90eea6f22c1/src/pitch.cpp#L385

But in Jean-Marc Valin's paper A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet is between 0 and 1.

LPCNet is designed to operate with 10-ms frames. Each frame includes 18 cepstral coefficients, a pitch period (between 16 and 256 samples), and a pitch correlation (between 0 and 1).

I think it can be improvement.

TeaPoly commented 2 years ago

Here is PR https://github.com/jzi040941/PercepNet/pull/43. It may need to be verified for correctness.

nicriverhoo commented 2 years ago

@TeaPoly Good idea. It looks like an normalization operation similarly, it is of benefit to quantization definitely, whereas, the dynamic range of pitch-corr is scaled to 0-1, which may not be good for model performance. From my perspective, the dynamic range of pitch-corr can represent the accuracy of the estimated pitch, in a sense, the model can learn the differences from the (large range)pitch-corr and estimated pitch pair. However, i am not sure about this.