python -m venv vcl_env && source vcl_env/bin/activate
pip install vocalocator
git clone https://github.com/neurostatslab/vocalocator.git && cd vocalocator
conda create -n vcl -f environment_conda.yml
git clone https://github.com/neurostatslab/vocalocator.git && cd vocalocator
pip install pipenv
pipenv install
pipenv shell
This section will give you a brief intro on training a deep neural network from scratch and performing inference via a pretrained model with VCL. For this we will use a smaller public dataset we have released alongside this package.
train_set.npy
and val_set.npy
may also be discarded, reserving test_set.npy
.sample_configs
directory, download xcorr_config_environment_1.json5
(4KB)|-- gerbilearbud-4m-e1_audio.h5
|-- best_weights.pt (Optional, from the pretrained model zip file)
|-- test_set.npy (Optional, from the test-train split zip file)
|-- xcorr_config_environment_1.json
python -m vocalocator --config xcorr_config_environment_1.json5 --data gerbilearbud-4m-e1_audio.h5 --save-path my_model
. This will periodically print a log of the training process to your terminal. Through this you can see how the training loss decreases:
TRAINING. Epoch 1 / 10 [32/6158] minibatch loss: 2.37226
TRAINING. Epoch 1 / 10 [416/6158] minibatch loss: 2.23515
TRAINING. Epoch 1 / 10 [800/6158] minibatch loss: 1.94896
And at the end of each epoch, see if performance on the validation set, a subset of the dataset which remains unseen during training, has improved:
>> DONE TRAINING, STARTING TESTING.
TESTING VALIDATION SET. Epoch 1 [416/770]
>> FINISHED EPOCH IN: 56 secs
>> MEAN VALIDATION LOSS, 14.967cm, IS BEST SO FAR, SAVING WEIGHTS TO my_model/best_weights.pt
WEIGHTS_PATH
field which points toward the model's trained weightsbest_weights.pt
WEIGHTS_PATH
field at the bottom of the new config file at my_model/config.json
. However, to use the pretrained weights from our website, this line needs to be added manually to xcorr_config_environment_1.json
:
...
"MAX_SNR": 15,
"PROB": 0.5
},
"MASK": {
"PROB": 0.5,
"MIN_LENGTH": 75,
"MAX_LENGTH": 125
}
! }, <-- Add a comma here, too
+ "WEIGHTS_PATH": "best_weights.pt"
}
python -m vocalocator.assess --config my_model/config.json --data gerbilearbud-4m-e1_audio.h5 --index my_model/indices/test_set.npy -o assessment.h5
python -m vocalocator.assess --config xcorr_config_environment_1.json --data gerbilearbud-4m-e1_audio.h5 --index test_set.npy -o assessment.h5
Dataset group/name | Shape | Data type | Description |
---|---|---|---|
/audio | (*, n_channels) | float | All sound events concatenated along axis 0 |
/length_idx | (n + 1,) | int | Index into audio dataset. Sound event i should span the half open interval [length_idx[i] , length_idx[i+1] ) and the first element should be 0. |
/locations | (n, 2) | float | Locations associated with each sound event. Only required for training. |
train_set.npy
and val_set.npy
. This directory is passed to VCL through the --indices
option.
# Simple script to generate a test-train split
import numpy as np
dataset_size = <INSERT DATASET SIZE>
train_size, val_size = int(0.8 * dataset_size), int(0.1 * dataset_size)
train_set, val_set, test_set = np.split(np.random.permutation(dataset_size), [train_size, train_size + val_size])
np.save('train_set.npy', train_set)
np.save('val_set.npy', val_set)
np.save('test_set.npy', test_set)
sample_configs
directory of the repositorypython -m vocalocator --data /path/to/directory/containing/trainset/ --config /path/to/config.json --save-path /path/to/model/weight/directory/ --indices /optional/path/to/index/directory
python -m vocalocator.assess --inference --data /path/to/hdf5/dataset.h5 --config /path/to/model_dir/config.json -o /optional/output/path.h5 --index /optional/index/path.npy
.
point_predictions
at the root of the HDF5 file.See our dataset website to learn more about and download our public datasets.