keunwoochoi / kapre

kapre: Keras Audio Preprocessors
MIT License
922 stars 146 forks source link

version 0.1.7 inconsistent with librosa and torchaudio for spectrogram and melspectrogram #68

Closed tbright17 closed 4 years ago

tbright17 commented 4 years ago

tensorflow 1.15.2 kapre 0.1.7 librosa 0.7.2 torchaudio 0.4.0 pytorch 1.4

Here is the script:

import tensorflow as tf
import tensorflow.keras as keras
import kapre
import torch
import torchaudio
import soundfile as sf
import numpy as np
import librosa

from kapre.time_frequency import Melspectrogram, Spectrogram

def tf_mel_spec(waveform, sr):
    src = np.random.random((1, sr*3))
    model = keras.models.Sequential()
    # model.add(Melspectrogram(sr=sr, n_mels=128, 
    #         n_dft=512, n_hop=256, input_shape=src.shape, 
    #         return_decibel_melgram=False,
    #         trainable_kernel=False, name='melgram'))
    model.add(Spectrogram(n_dft=512, n_hop=256, input_shape=src.shape,
                                   power_spectrogram=2.0, return_decibel_spectrogram=False,
                                   trainable_kernel=False, name='stft'))

    n_ch, nsp_src = model.input_shape[1:]
    waveform = waveform[:nsp_src]
    src_batch = waveform[np.newaxis, np.newaxis, :]
    pred = model.predict(x=src_batch)

    result = pred[0, :, :, 0]
    result = librosa.magphase(result, power=1)[0] 

    return result

def pt_mel_spec(waveform, sr):

    #return torchaudio.transforms.MelSpectrogram(sample_rate=sr, n_fft=512)(waveform)
    return torchaudio.transforms.Spectrogram(n_fft=512)(waveform) 

if __name__ == '__main__':
    torch.set_printoptions(precision=10)

    waveform, sr = sf.read('audio_files/0a5cbf90.wav')
    waveform_pt, sr = torchaudio.load('audio_files/0a5cbf90.wav')
    S = librosa.core.stft(waveform[:sr*3], n_fft=512, hop_length=256)
    magnitudes_librosa = librosa.magphase(S, power=2)[0]
    feat_tf = tf_mel_spec(waveform, sr)
    feat_pt = pt_mel_spec(waveform_pt[:,:sr*3], sr)[0,:,:]
    print(magnitudes_librosa[:,100])
    print(feat_tf[:,100])
    print(feat_pt[:,100])

And the output is

librosa:
[1.10851852e-06 8.74158843e-07 8.54210521e-05 3.19121718e-05
 3.50781775e-04 5.70360688e-04 4.71506442e-04 1.33082736e-04
 2.36166801e-04 2.67683784e-03 2.65861806e-02 1.65472374e+01
 9.82200775e+01 3.58395729e+01 8.73229876e-02 3.69395385e-03
 6.59208090e-05 2.03487216e-04 1.80396601e-04 6.96523712e-05
 5.19357709e-05 5.27438933e-05 7.63353273e-06 8.88889372e-06
 9.74568757e-05 1.61121425e-05 2.22577910e-06 1.53969177e-05
 9.81121593e-06 8.71325426e-07 2.67018072e-06 4.10762323e-06
 8.03146486e-06 9.26713483e-07 1.35341679e-05 1.18012777e-05
 1.65971062e-06 6.05958621e-06 2.69215070e-06 6.97137557e-06
 9.70695783e-06 2.38316170e-06 4.25549530e-07 2.39474434e-06
 8.15638941e-06 3.31711431e-06 4.06233085e-06 1.64566268e-06
 1.73159460e-05 1.08908116e-05 1.85551050e-06 6.25933399e-06
 5.03402953e-06 3.11271606e-06 7.53328277e-06 3.82949747e-06
 2.81360894e-06 1.42839542e-06 4.06417342e-07 1.11502613e-05
 1.87023034e-05 8.63777473e-07 3.61659147e-08 3.50658956e-06
 1.10603705e-05 2.16924650e-06 6.67423774e-06 6.06137019e-06
 4.19968019e-05 5.55325169e-05 3.79408880e-06 2.05031392e-06
 6.11004498e-06 2.32283878e-06 1.09293035e-06 1.72524409e-07
 1.71061595e-06 3.29353810e-07 1.43219091e-04 5.39098692e-04
 1.12343310e-04 1.26253701e-06 2.39022734e-06 5.66227129e-07
 4.30786986e-06 1.27641260e-05 7.18868841e-06 1.11864119e-06
 5.58383317e-06 1.75928903e-06 8.16781358e-06 1.31222976e-06
 6.11735061e-07 2.25175609e-06 1.69795928e-08 4.93276002e-06
 9.65972868e-06 4.09533686e-06 1.32710136e-06 3.37568849e-07
 1.18182186e-06 1.50641858e-06 2.41005955e-06 1.92551079e-06
 7.37560242e-07 3.81200067e-07 8.52214157e-08 8.81923540e-07
 2.89083414e-06 1.10792816e-05 9.36589458e-06 2.50758080e-06
 4.44088897e-07 4.75126626e-06 1.10572537e-05 1.36874005e-05
 7.68860173e-06 3.80657070e-06 3.58905072e-06 1.74703166e-06
 3.85216345e-06 2.29458601e-06 4.32009301e-07 2.19829872e-06
 9.62010517e-07 1.26382372e-06 3.93125947e-06 9.76325646e-06
 7.95094184e-06 5.47757099e-06 2.87542093e-06 2.54979932e-06
 1.10524434e-06 1.81150608e-06 5.59380805e-06 4.75777006e-06
 4.26812176e-06 1.86344857e-06 3.44113323e-06 3.79254061e-06
 5.12102042e-06 4.95977338e-06 3.64204152e-06 8.40335133e-07
 1.93536831e-08 1.49700065e-06 3.91529312e-07 6.21762138e-06
 2.40601839e-05 5.91599292e-06 1.19529004e-05 1.63506156e-05
 7.30572992e-07 3.14481781e-06 3.86763941e-06 1.38325600e-08
 2.13087856e-06 2.59206189e-07 3.32055379e-06 9.12301516e-07
 4.94135861e-07 1.57219279e-06 8.03949570e-06 3.78759523e-06
 1.07121159e-05 6.74604416e-06 7.86719011e-06 5.09817573e-06
 3.12496650e-06 1.81980226e-07 1.40982962e-07 4.19739354e-06
 8.66496066e-06 3.18054913e-06 3.26565601e-06 2.28833733e-06
 7.80576283e-06 1.26662985e-06 2.01688636e-06 2.40373629e-06
 8.70216263e-06 4.43025925e-07 2.59171111e-06 4.25037933e-06
 1.58913444e-05 8.23213304e-06 2.25644226e-06 3.62483370e-05
 7.67130259e-05 3.94823292e-05 6.55664007e-06 6.15359966e-07
 4.70838168e-09 2.41760017e-06 1.05583704e-05 2.90749194e-05
 4.09411696e-05 1.35867786e-05 5.81132235e-06 9.09590017e-06
 2.08544698e-05 5.20353296e-06 2.25095914e-06 1.56367914e-05
 3.33533135e-05 7.39422467e-06 4.77271578e-06 2.64052323e-06
 1.19199576e-05 6.04651450e-06 6.36793493e-06 8.23233950e-06
 2.41550006e-05 8.61847468e-07 2.52258189e-07 4.13605994e-06
 4.08686446e-07 3.73306671e-06 2.10629423e-06 3.60250233e-05
 4.34830690e-05 2.45403444e-05 3.06803668e-05 7.22433015e-06
 4.74784429e-06 5.99156510e-06 8.30830686e-07 1.43794086e-05
 4.00624413e-05 2.06331697e-05 1.07062015e-05 6.17354635e-06
 8.30940621e-07 4.20139315e-07 2.94868660e-05 6.25355315e-05
 3.41198829e-05 7.59986824e-06 1.43123566e-06 1.02090063e-07
 1.82451424e-06 5.61149477e-07 3.54327767e-06 5.25516816e-06
 5.30587840e-06 5.50769164e-06 1.14687396e-06 2.15286207e-07
 9.49729156e-07 2.01238186e-06 1.29762554e-06 2.84179100e-06
 3.61252387e-06 1.16148738e-06 1.81653695e-06 2.21326877e-06
 7.31070656e-07]
kapre:
[2.10503643e-07 1.07328151e-05 3.27906091e-05 2.41272919e-05
 2.22683550e-04 4.85209981e-04 7.24630590e-05 6.38650905e-04
 2.50704070e-05 2.82705622e-03 2.77934112e-02 1.64621735e+01
 9.76249619e+01 3.52643204e+01 6.76752180e-02 4.20056749e-03
 1.03895215e-03 2.79177584e-05 2.36190084e-04 1.07281758e-05
 4.55106674e-05 5.87042523e-05 8.30559088e-07 3.46041725e-06
 4.94124179e-05 5.49856559e-05 1.04178871e-05 1.00495499e-05
 1.49147190e-05 2.33061583e-05 2.13377079e-05 1.11869176e-05
 5.73164061e-06 1.51586610e-05 2.62415124e-05 7.84286294e-06
 3.08346284e-06 7.54490111e-06 6.38743722e-06 1.16473675e-05
 1.38958148e-05 5.72920362e-06 3.25148608e-06 1.01013404e-06
 3.15472198e-06 2.72514035e-06 4.63071956e-06 7.10760196e-06
 6.83322469e-06 3.12339444e-06 1.06413847e-06 2.24276801e-06
 5.55769134e-07 6.94419305e-06 1.07184096e-05 2.20156366e-07
 3.27910470e-07 2.31111949e-06 4.77857895e-07 8.53053280e-06
 1.76510712e-05 9.61683781e-06 4.46119338e-06 7.47691971e-08
 1.26017630e-05 1.98423550e-05 9.55039832e-06 5.34410765e-06
 7.24313359e-05 7.72925923e-05 6.11973473e-06 5.04884656e-06
 3.75147692e-06 4.93531161e-07 2.83012855e-06 6.50390632e-07
 1.24906194e-06 5.15332431e-06 1.97740359e-04 6.21827377e-04
 1.49595129e-04 6.07101569e-08 1.95546977e-06 1.99711917e-06
 5.38423728e-06 1.04312203e-05 4.95887025e-06 1.16665524e-06
 7.03001069e-06 1.29644459e-05 1.25045881e-05 2.67749715e-06
 1.61819787e-06 3.04750847e-06 1.42243903e-06 1.26420036e-06
 5.72412682e-06 5.89943056e-06 1.37939253e-06 3.76476180e-07
 2.16017384e-06 1.58477434e-07 1.48163917e-06 1.35657700e-07
 1.65759343e-06 3.60219843e-07 2.98716799e-07 1.66851578e-06
 5.82847815e-06 3.49090055e-06 7.33104571e-06 4.45747037e-06
 7.14527232e-07 7.32973012e-06 1.84675973e-05 6.44594775e-06
 4.06358640e-06 6.39201880e-06 2.40788404e-06 4.67098829e-07
 7.51392326e-06 2.49504183e-06 7.19588229e-07 8.19939999e-07
 2.34847562e-06 6.52894596e-06 5.60504031e-06 1.30243252e-05
 2.53065900e-05 8.76853937e-06 8.52142961e-08 4.89662682e-07
 1.43414275e-06 2.52534255e-06 3.61648267e-06 1.16961712e-06
 5.75085778e-06 6.75814999e-06 6.23207052e-06 9.15121973e-07
 2.12736177e-06 1.94880698e-07 4.12395138e-06 2.19816070e-06
 7.61569936e-07 2.50063704e-06 4.67589598e-06 1.70803687e-05
 2.55859231e-05 1.14936684e-05 1.29284126e-05 1.34704633e-05
 3.78682194e-06 1.83400675e-06 3.40963538e-06 4.29229749e-06
 5.45410330e-06 3.82843245e-06 1.70487851e-06 1.17534370e-07
 4.93632001e-07 8.24547328e-07 3.97695294e-06 1.11136865e-06
 3.52012648e-06 2.74792143e-07 4.18213995e-06 3.78065261e-06
 1.19622041e-06 2.08333358e-06 3.20604136e-06 1.13333172e-05
 1.03674829e-05 5.95082042e-07 1.12123564e-06 7.46934234e-07
 3.34258357e-06 2.29485568e-06 6.29933129e-06 1.60168438e-05
 1.98458019e-05 7.76762499e-06 6.38288338e-06 1.29176101e-06
 8.16358533e-06 5.16837827e-06 9.66390689e-06 9.85370662e-06
 4.78252005e-05 2.65798280e-05 5.26270878e-06 4.66088750e-07
 1.10906876e-06 8.83404851e-08 3.31656793e-06 3.68218844e-06
 4.77747599e-05 1.62547076e-05 1.40993393e-06 6.15355589e-07
 7.64993911e-06 2.81365419e-06 7.38661925e-07 3.26877603e-06
 3.45409462e-05 7.19821082e-06 6.96745110e-06 4.40000395e-06
 1.00918087e-05 3.56211308e-06 1.39305212e-05 2.51807942e-05
 5.12806037e-05 2.05073738e-05 6.04522074e-06 1.79360995e-07
 2.07231164e-06 4.33788227e-06 7.07109075e-06 2.34570434e-05
 5.71101045e-05 2.65538838e-05 1.77651309e-05 2.10709959e-06
 7.31728096e-06 4.84000111e-06 2.36468281e-07 1.27827934e-05
 5.86259121e-05 1.21953954e-05 5.38238146e-06 4.90689581e-06
 3.07824371e-06 5.49033939e-06 1.37812503e-05 2.11169099e-05
 4.31692861e-05 1.22320580e-05 5.90473483e-08 1.09466146e-06
 1.40988800e-06 1.19009474e-06 1.64519724e-06 1.00207683e-06
 1.62213564e-05 9.27199017e-06 3.46721496e-09 1.24941300e-06
 7.79272170e-07 1.23441396e-06 4.51954406e-07 1.01647936e-06
 1.64993992e-06 2.13383888e-08 1.22975166e-07 2.70594569e-06
 4.35330230e-06]
torchaudio:
tensor([1.1085919596e-06, 8.7416333372e-07, 8.5420673713e-05, 3.1912513805e-05,
        3.5078224028e-04, 5.7035905775e-04, 4.7150941100e-04, 1.3308378402e-04,
        2.3616488033e-04, 2.6768306270e-03, 2.6586160064e-02, 1.6547237396e+01,
        9.8220054626e+01, 3.5839569092e+01, 8.7323062122e-02, 3.6939587444e-03,
        6.5921725763e-05, 2.0348565886e-04, 1.8039680435e-04, 6.9651767262e-05,
        5.1938750403e-05, 5.2743933338e-05, 7.6335072663e-06, 8.8888937171e-06,
        9.7458898381e-05, 1.6111114746e-05, 2.2259262096e-06, 1.5396017261e-05,
        9.8114260254e-06, 8.7139227389e-07, 2.6703144158e-06, 4.1076059460e-06,
        8.0315194282e-06, 9.2684649644e-07, 1.3534619939e-05, 1.1801299479e-05,
        1.6591417307e-06, 6.0596644289e-06, 2.6921127301e-06, 6.9712923505e-06,
        9.7066240414e-06, 2.3829920792e-06, 4.2553261892e-07, 2.3944544409e-06,
        8.1562966443e-06, 3.3169001199e-06, 4.0622198867e-06, 1.6454349634e-06,
        1.7315956939e-05, 1.0891802958e-05, 1.8550374534e-06, 6.2603412516e-06,
        5.0329581427e-06, 3.1125259738e-06, 7.5330190157e-06, 3.8295102058e-06,
        2.8135409593e-06, 1.4282861684e-06, 4.0639625354e-07, 1.1150749742e-05,
        1.8701482986e-05, 8.6359341367e-07, 3.6182150609e-08, 3.5066186683e-06,
        1.1059701137e-05, 2.1689891128e-06, 6.6743550633e-06, 6.0616307564e-06,
        4.1997358494e-05, 5.5531636463e-05, 3.7942679683e-06, 2.0505008251e-06,
        6.1096188801e-06, 2.3229629278e-06, 1.0929688870e-06, 1.7262944141e-07,
        1.7112291744e-06, 3.2940769756e-07, 1.4321837807e-04, 5.3909921553e-04,
        1.1234367412e-04, 1.2626143189e-06, 2.3902703106e-06, 5.6611997934e-07,
        4.3077707232e-06, 1.2764099665e-05, 7.1880513133e-06, 1.1186617712e-06,
        5.5832351791e-06, 1.7592982431e-06, 8.1680827861e-06, 1.3120950371e-06,
        6.1165189891e-07, 2.2518211154e-06, 1.6978743744e-08, 4.9324862630e-06,
        9.6595194918e-06, 4.0953082134e-06, 1.3272564274e-06, 3.3760909446e-07,
        1.1820188774e-06, 1.5062996681e-06, 2.4100829705e-06, 1.9254853214e-06,
        7.3754438290e-07, 3.8126046320e-07, 8.5202302103e-08, 8.8173152335e-07,
        2.8906054013e-06, 1.1079111573e-05, 9.3659109552e-06, 2.5074032237e-06,
        4.4394306542e-07, 4.7513067329e-06, 1.1057238225e-05, 1.3688341824e-05,
        7.6888900367e-06, 3.8064831642e-06, 3.5889702303e-06, 1.7470011926e-06,
        3.8522034629e-06, 2.2945446290e-06, 4.3212781975e-07, 2.1980492875e-06,
        9.6164296792e-07, 1.2637544842e-06, 3.9310266402e-06, 9.7628299045e-06,
        7.9507735791e-06, 5.4773845477e-06, 2.8754484447e-06, 2.5499450658e-06,
        1.1050443618e-06, 1.8116616047e-06, 5.5937616708e-06, 4.7577914302e-06,
        4.2679448597e-06, 1.8633054424e-06, 3.4411425531e-06, 3.7924364733e-06,
        5.1214324230e-06, 4.9592508731e-06, 3.6419687603e-06, 8.4022553892e-07,
        1.9339637447e-08, 1.4970397615e-06, 3.9152061504e-07, 6.2176459323e-06,
        2.4061218937e-05, 5.9161088757e-06, 1.1953011381e-05, 1.6350737496e-05,
        7.3055917937e-07, 3.1448200843e-06, 3.8676716940e-06, 1.3829227541e-08,
        2.1310092961e-06, 2.5922074087e-07, 3.3206029002e-06, 9.1224779908e-07,
        4.9414563819e-07, 1.5721971067e-06, 8.0393983808e-06, 3.7872141547e-06,
        1.0712070434e-05, 6.7462219704e-06, 7.8671318988e-06, 5.0984599511e-06,
        3.1249840049e-06, 1.8197926011e-07, 1.4100561430e-07, 4.1973780753e-06,
        8.6645086412e-06, 3.1806287097e-06, 3.2660070701e-06, 2.2883975817e-06,
        7.8061084423e-06, 1.2666750990e-06, 2.0169229629e-06, 2.4039720756e-06,
        8.7027538029e-06, 4.4297669888e-07, 2.5916860977e-06, 4.2505575948e-06,
        1.5891142539e-05, 8.2322330854e-06, 2.2563617676e-06, 3.6247311073e-05,
        7.6713491580e-05, 3.9481426938e-05, 6.5570793595e-06, 6.1529010509e-07,
        4.7038071216e-09, 2.4175894850e-06, 1.0558300346e-05, 2.9074082704e-05,
        4.0942046326e-05, 1.3586100977e-05, 5.8111495491e-06, 9.0961948445e-06,
        2.0854165996e-05, 5.2034565670e-06, 2.2510116651e-06, 1.5635940144e-05,
        3.3350926969e-05, 7.3940932452e-06, 4.7724929573e-06, 2.6404939035e-06,
        1.1919973076e-05, 6.0465740717e-06, 6.3677894104e-06, 8.2320975707e-06,
        2.4154964194e-05, 8.6181165670e-07, 2.5229809353e-07, 4.1360444811e-06,
        4.0861621642e-07, 3.7330628402e-06, 2.1062789983e-06, 3.6024848669e-05,
        4.3483545596e-05, 2.4540684535e-05, 3.0680472264e-05, 7.2243365139e-06,
        4.7476819418e-06, 5.9917110775e-06, 8.3084609059e-07, 1.4379291315e-05,
        4.0062008338e-05, 2.0633569875e-05, 1.0706566172e-05, 6.1734754127e-06,
        8.3098683490e-07, 4.2010333345e-07, 2.9486856874e-05, 6.2536200858e-05,
        3.4119166230e-05, 7.5997040767e-06, 1.4312838630e-06, 1.0208852075e-07,
        1.8245158344e-06, 5.6118392422e-07, 3.5434868551e-06, 5.2569112086e-06,
        5.3045478126e-06, 5.5067953326e-06, 1.1467917602e-06, 2.1528911986e-07,
        9.4952440577e-07, 2.0123570721e-06, 1.2975981463e-06, 2.8416563964e-06,
        3.6123706195e-06, 1.1615276208e-06, 1.8166758764e-06, 2.2134058781e-06,
        7.3117666943e-07])

The outputs of librosa and torchaudio are very close but not for kapre.

keunwoochoi commented 4 years ago

Hi, thanks for reporting it. Actually that is a problem I couldn't solve, and it is tricky to dig into it because in Kapre, STFT is implemented not with FFT kernels but with DFT kernels. Ultimately, a way to fix is to use tf.stft. Technically it is not that difficult - and I already posted a gist that would do the job https://keunwoochoi.wordpress.com/2019/10/06/tensorflow-melspectrogram-layer-2-colab-notebook-and-its-compatibility-to-librosa/ . But I am not actively developing Kapre and can't spare any time to update it. I would be happy to review a PR :)

Toku11 commented 4 years ago

Here you have errors in lasts cells solved

https://github.com/Toku11/kapre/blob/delta/examples/STFT_and_Melspectrogram_in_Tensorflow_vs_Librosa.ipynb

keunwoochoi commented 4 years ago

This will be fixed in 0.3.