Open sammlapp opened 3 years ago
Incorporating basic pre-processing (ie spectrogram creation, but not augmentation) into a cross-platform model export (eg, ONNX ) is an important step toward model sharing and cross-platform compatibility. I believe that using torchaudio would allow us to incorporate Spectrogram/FFT into the model.
Others (if I remember correctly, shyamblast of koogu) have suggested having separate scenarios for training (allow flexibility in augmentation and preprocessing by keeping all preprocessing outside the model) and prediction/inference (incorporate preprocessing into the model so that all parameters are carried with the saved model).
I'm using 'model' here in the sense of a Pytorch model object rather than an opensoundscape.torch.models.cnn.CNN object.
(see also #500 which is blocked until pytorch adds support for certain preprocessing operations)
The torchaudio modules implement sox-effects, spectrograms, and various audio transforms. I'm not sure if we should use/rely on these within opensoundscape. It may be somewhat redundant to have internal implementations of similar functions to torchaudio, but the torchaudio package does not seem very featured or robust, so it may not be worth using at this point.