Closed wenyong-h closed 5 years ago
This is code written by @jmvalin, but I have been using it for a while.
Yes, [2NB_BANDS] is the pitch period, with a little scaling; [2NB_BANDS+1] is the pitch predictor gain, and can be interpreted as a measure of voicing. [2*NB_BANDS+2] is the LPC model energy, which isn't actually used in the model (it's zeroed out in train_lpcnet.py).
BTW a couple of new pitch estimators are currently being trialled with LPCNet.
Thanks for your explanation. One more question? Do you think LPCNet features(18 cepstral coefficients plus 2 pitch parameters) will lose useful information compared to Tactotron features (80-band mel-scale spectrogram features)?
18-band cepstrum definitely loses some of the information from a 80-band spectrogram, mostly related to pitch (which is why a good pitch estimator is useful). OTOH, it also does a good job at decoupling the spectral shape from the pitch. That being said, LPCNet should work fine with 80-band Tacotron features if you prefer using that.
Thanks for your great explanations, that's really helpful.
@jmvalin Hi, it's a greate Job! I find it hard to understand the pitch extraction process in pitch.c file ,will you give some detail explaination or algorithms about that?Another question is the pitch period range from 32-256,which means the pitch is in [62.5hz,500hz],is that good enough for children speech?
@jmvalin , @drowe67 In your code, https://github.com/mozilla/LPCNet/blob/b811cade95cf19530ddfbe5aadeaff7c4f89ba77/src/dump_data.c#L160-L162 the first line is pitch period? the second line is pitch correlation?what does it stands for? what's the third line? And your pitch estimation code is hard to read without any comments, Can you provide some explanation, or provide links to related papers or materials?
By the way, thanks for sharing your excellent work!