kitzeslab / opensoundscape

Open source, scalable software for the analysis of bioacoustic recordings
http://opensoundscape.org
MIT License
127 stars 14 forks source link

Support/use torchaudio? #337

Open sammlapp opened 3 years ago

sammlapp commented 3 years ago

The torchaudio modules implement sox-effects, spectrograms, and various audio transforms. I'm not sure if we should use/rely on these within opensoundscape. It may be somewhat redundant to have internal implementations of similar functions to torchaudio, but the torchaudio package does not seem very featured or robust, so it may not be worth using at this point.

sammlapp commented 1 year ago

Incorporating basic pre-processing (ie spectrogram creation, but not augmentation) into a cross-platform model export (eg, ONNX ) is an important step toward model sharing and cross-platform compatibility. I believe that using torchaudio would allow us to incorporate Spectrogram/FFT into the model.

Others (if I remember correctly, shyamblast of koogu) have suggested having separate scenarios for training (allow flexibility in augmentation and preprocessing by keeping all preprocessing outside the model) and prediction/inference (incorporate preprocessing into the model so that all parameters are carried with the saved model).

I'm using 'model' here in the sense of a Pytorch model object rather than an opensoundscape.torch.models.cnn.CNN object.

sammlapp commented 1 year ago

(see also #500 which is blocked until pytorch adds support for certain preprocessing operations)