keunwoochoi / kapre

kapre: Keras Audio Preprocessors
MIT License
922 stars 146 forks source link

Missing conversion from mel-spectrogram layer to LogmelToMFCC layer? #106

Closed Mxgra closed 4 years ago

Mxgra commented 4 years ago

It is I, again. I hope it's okay to bug you here with a general question.

I want to use kapre layers to convert an audio input to mfcc directly in the model. I spent some time digging through the docs and I have one thing I dont understand: To my understanding, we would get a mel-spectrogram (am currently using composed.get_melspectrogram_layer), take the log of the result, and feed this into the LogmelToMFCC layer, which then outputs the mfccs to be further used in the model.

input_shape = (16000,1)
melgram_layer = get_melspectrogram_layer(input_shape=input_shape, n_fft=2048, win_length=2018, hop_length=1024,
                                         #n_mels=40,
                                        input_data_format='channels_last', output_data_format='channels_last',
                                        sample_rate=16000, name='melspectro_layer')
model.add(melgram_layer)
#Missing conversion to log
model.add(kapre.LogmelToMFCC(n_mfccs=40))

How do I convert the output of the melgram layer to log? Is there something I'm missing?

keunwoochoi commented 4 years ago

No problem. Have you checked out the documentation website? Anyway, you can do it by passing return_decibel=True in the get_melspectrogram_layer

More details: https://kapre.readthedocs.io/en/latest/composed.html#kapre.composed.get_melspectrogram_layer .

Mxgra commented 4 years ago

oooh I didn't realize that decibel scale is equivalent to log scale, thank you very much!

Yes I looked through the docs and find them very comprehensive, if I'd have made the connection between decibel and log scale, everything would be very clear 👌