Closed tusharpoddar closed 4 years ago
file_len=10 -----> size of the wav file that i am using in minutes. slice_len=25. -------> size of each window size in miliseconds. step_size=0.9 ------> hop
spec_window=128. --------------> nperseg NFFT=512
How do i convert these to the parameters given in the vggish_params file.
I assume you're using scipy.signal.spectrogram
(specgram
is defined in matplotlib.pyplot
, but doesn't have a nperseg
argument).
vggish takes log-mel-spectrum slices as input, as calculated by log_mel_spectrogram.
This calculates a spectrogram internally with:
spectrogram = stft_magnitude(
data,
fft_length=fft_length,
hop_length=hop_length_samples,
window_length=window_length_samples)
I think you can replace this with a call to scipy.signal.spectrogram like:
spectrogram = np.abs(scipy.signal.spectrogram(
data,
nperseg=window_length_samples,
noverlap=nperseg - hop_length_samples,
nfft=fft_length)[2]).transpose()
This uses a different window than the the built-in stft_magnitude (Tukey with with r=0.25 vs. periodic Hann), but the results will be pretty similar.
window_length_samples
is the number of samples in params.STFT_WINDOW_LENGTH_SECONDS
(i.e., integer value closest to the time in seconds multiplied by the sample rate), and hop_length_samples
is the number of samples in params.STFT_HOP_LENGTH_SECONDS
. If you alter these (from the defaults of 0.025 and 0.010), the input to the classifier won't match the training data, and the results may not be particularly meaningful. It's just an embedding though, so maybe it will be useful.
Also, you said that if I change the parameters, the input to the classifier wont match. How do I train my own model, Like how do I give it my own training data?