Hi, am doing speech recognition for micro controller. Am new to this and trying to modify the code which is written for Acoustic Scene Classification where they have used 30sec wav audio dataset.
Now, I need to use 1sec dataset for speech recognition but am not getting proper value after feature extraction.
Below are the codes which am using for log mel spectrogram. Can help me pls?
"""LogMel Feature Extraction example."""
import numpy as np
import sys
import librosa
import librosa.display
import scipy.fftpack as fft
Hi, am doing speech recognition for micro controller. Am new to this and trying to modify the code which is written for Acoustic Scene Classification where they have used 30sec wav audio dataset.
Now, I need to use 1sec dataset for speech recognition but am not getting proper value after feature extraction.
Below are the codes which am using for log mel spectrogram. Can help me pls?
"""LogMel Feature Extraction example."""
import numpy as np import sys import librosa import librosa.display import scipy.fftpack as fft
SR = 16000 N_FFT = 1024 N_MELS = 30
def create_col(y): assert y.shape == (1024,)
def feature_extraction(y): assert y.shape == (32, 1024)