Keras Audio Preprocessors - compute STFT, ISTFT, Melspectrogram, and others on GPU real-time.
Tested on Python 3.6 and 3.7
channels_first
and channels_last
)tf.signals
implementation such as..
STFT
and InverseSTFT
pairkapre.time_frequency.STFT()
as the first layer of the model.n_fft
to boost the performance.pip install kapre
Please refer to Kapre API Documentation at https://kapre.readthedocs.io
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, BatchNormalization, ReLU, GlobalAveragePooling2D, Dense, Softmax
from kapre import STFT, Magnitude, MagnitudeToDecibel
from kapre.composed import get_melspectrogram_layer, get_log_frequency_spectrogram_layer
# 6 channels (!), maybe 1-sec audio signal, for an example.
input_shape = (44100, 6)
sr = 44100
model = Sequential()
# A STFT layer
model.add(STFT(n_fft=2048, win_length=2018, hop_length=1024,
window_name=None, pad_end=False,
input_data_format='channels_last', output_data_format='channels_last',
input_shape=input_shape))
model.add(Magnitude())
model.add(MagnitudeToDecibel()) # these three layers can be replaced with get_stft_magnitude_layer()
# Alternatively, you may want to use a melspectrogram layer
# melgram_layer = get_melspectrogram_layer()
# or log-frequency layer
# log_stft_layer = get_log_frequency_spectrogram_layer()
# add more layers as you want
model.add(Conv2D(32, (3, 3), strides=(2, 2)))
model.add(BatchNormalization())
model.add(ReLU())
model.add(GlobalAveragePooling2D())
model.add(Dense(10))
model.add(Softmax())
# Compile the model
model.compile('adam', 'categorical_crossentropy') # if single-label classification
# train it with raw audio sample inputs
# for example, you may have functions that load your data as below.
x = load_x() # e.g., x.shape = (10000, 6, 44100)
y = load_y() # e.g., y.shape = (10000, 10) if it's 10-class classification
# then..
model.fit(x, y)
# Done!
The STFT
layer is not tflite compatible (due to tf.signal.stft
). To create a tflite
compatible model, first train using the normal kapre
layers then create a new
model replacing STFT
and Magnitude
with STFTTflite
, MagnitudeTflite
.
Tflite compatible layers are restricted to a batch size of 1 which prevents use
of them during training.
# assumes you have run the one-shot example above.
from kapre import STFTTflite, MagnitudeTflite
model_tflite = Sequential()
model_tflite.add(STFTTflite(n_fft=2048, win_length=2018, hop_length=1024,
window_name=None, pad_end=False,
input_data_format='channels_last', output_data_format='channels_last',
input_shape=input_shape))
model_tflite.add(MagnitudeTflite())
model_tflite.add(MagnitudeToDecibel())
model_tflite.add(Conv2D(32, (3, 3), strides=(2, 2)))
model_tflite.add(BatchNormalization())
model_tflite.add(ReLU())
model_tflite.add(GlobalAveragePooling2D())
model_tflite.add(Dense(10))
model_tflite.add(Softmax())
# load the trained weights into the tflite compatible model.
model_tflite.set_weights(model.get_weights())
Please cite this paper if you use Kapre for your work.
@inproceedings{choi2017kapre,
title={Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras},
author={Choi, Keunwoo and Joo, Deokjin and Kim, Juho},
booktitle={Machine Learning for Music Discovery Workshop at 34th International Conference on Machine Learning},
year={2017},
organization={ICML}
}