A Python package for modern audio feature extraction
For information about contributing, citing, licensing (including commercial licensing) and getting in touch, please see our wiki.
Our documentation can be found here. Our paper can be found here.
Please join our Slack channel if you have questions or suggestions!
Install using pip
pip install surfboard
Alternatively,
git clone https://github.com/novoic/surfboard.git
cd surfboard
pip3 install .
Given a set of components and an optional set of statistics to apply to the time-varying components, extract them using Python.
from surfboard.sound import Waveform
from surfboard.feature_extraction import extract_features
sound = Waveform(path='/path/to/audio.wav')
# Option 1: Extract MFCC and RMS energy components as time series.
component_dataframe = extract_features([sound], ['mfcc', 'rms'])
# Option 2: Extract the mean and standard deviation of the MFCC and RMS energy features over time.
feature_dataframe = extract_features([sound], ['mfcc', 'rms'], ['mean', 'std'])
# Option 3: Extract MFCC and RMS energy features as time series with non-default arguments.
mfcc_with_arg = {'mfcc': {'n_mfcc': 26, 'n_fft_seconds': 0.08, 'hop_length_seconds': 0.02}}
feature_with_args_dataframe = extract_features([sound], [mfcc_with_arg, 'rms'], ['mean', 'std'])
.wav
filesAssume the following directory structure:
my_wav_folder/
│ swell.wav
│ cool_hair.wav
| wave_crash.wav
| ...
Using a .yaml
config (see example_configs
for examples), you can use the surfboard CLI to return a .csv
file containing a set of features computed for every .wav
file in my_wav_folder
. You can optionally use multiple processes with the -j
flag.
surfboard compute-features -i my_wav_folder -o cool_features.csv -F surfboard/example_configs/spectral_features.yaml -j 4
.yaml
configYou can create a custom .yaml
config in order to extract specific features from your audio data. You can also pick specific statistics to apply to the time-varying components. The set of available statistics is described in the Available statistics section below.
Take a peak at the example configs in surfboard/example_configs/
. The package assumes a .yaml
file with the following structure:
components:
- mfcc
n_mfcc: 26
- log_melspec
n_mels: 64
statistics:
- mean
- std
This config will compute the mean and standard deviation of every MFCC (13 by default but set to 26 here) and log mel-spectrogram filterbank (128 by default but 64 here) on every .wav
file in my_wav_folder
if called with the following command:
surfboard compute-features -i my_wav_folder -o epic_features.csv -F my_config.yaml
compute-components
functionalitySometimes you might want to retain the time axis of time-dependent components but still use the CLI. Given a .yaml
config without a statistics
section, you can. It will dump the components as .pkl
file which can be loaded with pd.read_pickle
.
surfboard compute-components -i my_wav_folder -o epic_features.pkl -F surfboard/example_configs/chroma_components.yaml
We have provided notebooks in notebook_tutorials
for examples of how Surfboard can be used to extract features from audio and even to perform environmental sound classification.
Otherwise, here are some examples:
Define a waveform:
from surfboard.sound import Waveform
import numpy as np
# Instantiate from a .wav file.
sound = Waveform(path="/surf/in/USA/sound.wav", sample_rate=44100)
# OR: instantiate from a numpy array.
sound = Waveform(signal=np.sin(np.arange(0, 2 * np.pi, 1/24000)), sample_rate=44100)
Get the F0 contour:
import matplotlib.pyplot as plt
f0_contour = sound.f0_contour()
plt.plot(f0_contour[0])
Get the MFCCs:
mfccs = sound.mfcc()
Get different shimmers, jitters, formants:
shimmers = sound.shimmers()
jitters = sound.jitters()
formants = sound.formants()
You can take a look at COMPONENTS.md
to see which components can be computed using Surfboard.
There is extensive documentation in the method docstrings in surfboard/sound.py
. Please refer to those for more details on each individual feature (or to our documentation, alternatively).
A thorough list of the statistics implemented in Surfboard can be found in STATISTICS.md
Often, the components computed from the surfboard.sound.Waveform
class have a time dimension, in which case they are represented as numpy arrays with shape [n_components, T]
. For example a log mel spectrogram can be an array with shape [128, T]
. We often want a fixed-length representation of variable length audio signals. Hence, we need to somehow aggregate the time dimension.
Following best practices, we have implemented a variety of statistics which take an array with shape [n_components, T]
and return an array with shape [n_components,]
, aggregating each component along the time dimension with a statistic. These are implemented in surfboard/statistics.py
.
Some very rudimentary tests have been implemented in the tests
directory, to make sure that methods run successfully. Feel free to use them while developing new components/statistics.
_slidingwindow
at the end?__ A lot of the components above are defined as floating point numbers computed from a sequence of arbitrary length. Sometimes, it makes more sense to see how these metrics change over time as a sliding window hovers over the waveform. This is what "sliding window" means here: we compute the component on a sliding window.sound.{}?
Try it out! Otherwise, take a look at our documentation, or the docstrings in surfboard/sound.py
to see the returned types. .mp3
files? Yes, but it might take a while longer than if you ran Surfboard on .wav
files because of how LibROSA loads .mp3
files. For large jobs, we advise first converting .mp3
files to .wav
files using ffmpeg..csv
files obtained from the CLI full of NaNs? Sometimes, the feature extraction can fail either for a specific component/statistic, or for an entire audio file. This can have a variety of reasons. When such a failure occurs, we populate the dataframe with a NaN.Surfboard is released under dual commercial and open source licenses. This is the open-source (GPL v3.0) version. See LICENSE
for more details.