pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch
https://pytorch.org/audio
BSD 2-Clause "Simplified" License
2.47k stars 640 forks source link

LPC analysis for speech signals #227

Open ahmed-fau opened 5 years ago

ahmed-fau commented 5 years ago

Hi, Is there any plan to provide torchaudio with a new feature of calculating the LPC analysis parameters for speech signals? Or at least converting MFCC to LPC?

cpuhrsch commented 5 years ago

@ahmed-fau, I assume by this you mean features such as "Perceptual Linear Predictive (PLP) Analysis of Speech"?

ahmed-fau commented 5 years ago

@cpuhrsch PLP is an advanced version of LPC which incorporates perceptual modeling for linear prediction. LPC is simpler as you only estimate the analysis filter parameters from the auto-correlation matrix of a speech frame in time domain. This is already provided in Python via AudioLazy. So, it could be interfaced with torchaudio to provide LPC analysis/synthesis features which are being used in current generative models for speech synthesis (e.g., LPCNet).

cpuhrsch commented 5 years ago

@ahmed-fau - and presumably also used as a building block for PLP?

Adding @vincentqb

ahmed-fau commented 5 years ago

In principle: PLP applies linear prediction within its estimation pipeline. However, I am not sure whether a toolkit like Kaldi provides linear prediction in a direct fashion for that purpose. If so, then I think all that we need would be to get a plain interface through torchaudio to that functionality.

ahmed-fau commented 5 years ago

@cpuhrsch if you check the original PLP paper at Fig.1, you can find linear prediction is a dedicated functionality during the estimation process. The last block in the diagram (called: solution for auto-regressive coefficients) is exactly the functionality I am speaking about. When you apply this processing block directly to raw speech waveform (i.e., without all the preceding blocks), these coefficients correspond to LPC.