Open sammlapp opened 2 years ago
Because ONNX will require a numeric vector input, the input will logically be either (a) audio samples or (b) a pre-processed 2d representation of the audio such as a spectrogram, potentially with multiple channels. The advantage of passing the audio sample vector (wav) is that all preprocessing parameters will be included in the model; the only thing the user has to get right is the audio sampling rate. Option (b) gives more flexibility because pre-processing does not need to be packaged into the model, but allows more opportunity for pre-processing operations and parameters to be lost or mis-implemented when the model changes hands.
Pytorch support for stft with ONNX is still a work in progress
Apparently "torch.onnx.dynamo_export" will add some onnx operators. Also, we could apparently do some custom handling to use implemented onnx functions but I don't fully understand how (see https://github.com/Alexey-Kamenev/tensorrt-dft-plugins/blob/main/tests/test_dft.py#L35 and https://github.com/pytorch/pytorch/issues/81075#issuecomment-1530713416)
Modulus has done something similar https://github.com/NVIDIA/modulus/blob/main/modulus/models/afno/afno.py#L140
If/when we implement something like this, we will need all preprocessing steps (for inference in the exported ONNX model) to be part of the pytorch model, ie layers with forward methods. This raises the question of whether we will end up entirely changing from the use of librosa
and scipy
to directly using the torchaudio
API.
apparently now should work!
there's a new torch issue to follow for fft export to ONNX: https://github.com/pytorch/pytorch/issues/113067
stft + onnx seems to still not be ready (https://github.com/pytorch/pytorch/issues/113067#issuecomment-1892530038)
not sure how stable this library is, but it provides stft, spectrogram, and melspectrogram exportable to ONNX and CoreML https://github.com/adobe-research/convmelspec/tree/main
now follow this: https://github.com/pytorch/pytorch/issues/135087
[ ] We should be able to export a model to ONNX so that someone could run predictions using the model without opensoundscape.
[ ] Second, we should be able to load an ONNX model and generate predictions. If this is best done with a simple torch script rather than by implementing something in opensoundscape, that's fine - we can just add documentation of how to do this
[ ] Third, we should be able to load and ONNX model into opensoundscape such that we could re-train (eg, warm-starting) within opensoundscape
It may be necessary or at least logical to use torchaudio to incorporate preprocessing steps into the model, as mentioned in #337
Be aware of the numpy & built-in types caveats for the torch.onnx module