joelle-o-world / mfcc

Calculate MFCC (Mel-frequency Cepstral Coefficients) from mic input in the browser. (TypeScript)
1 stars 0 forks source link

What is the behaviour of the appendEnergy parameter in python_speech_features MFCC? #10

Open joelle-o-world opened 4 years ago

joelle-o-world commented 4 years ago

":param appendEnergy: if this is true, the zeroth cepstral coefficient is replaced with the log of the total frame energy."

seems simple enough

joelle-o-world commented 4 years ago

if appendEnergy: feat[:,0] = numpy.log(energy) # replace first cepstral coefficient with log of frame energy

joelle-o-world commented 4 years ago

energy is the sum of the power spectrum bins. Replacing zero with the smallest possible floating point number to avoid divide-by-zero errors

energy = numpy.sum(pspec,1) # this stores the total energy in each frame
energy = numpy.where(energy == 0,numpy.finfo(float).eps,energy) # if energy is zero, we get problems with log
joelle-o-world commented 4 years ago

A confusing point: appendEnergy is supposedly enabled by default. It should copy the energy (the sum of the power density spectrum) and overwrite the first coefficient. The energy should always be a positive number as it is a sum of squares. However in the example on the google playground (https://colab.research.google.com/drive/1Nj59U3EIeSUHKcLebf0vAXeZxrE2h7qy) the first coefficient is -10.9531314

Something fishy is going on...