nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.
Apache License 2.0
95 stars 48 forks source link

MelWeightMatrix #738

Closed vivekkhandelwal1 closed 3 months ago

PhaneeshB commented 5 months ago

Onnx Docu

For folks (like me) who are new to Mel Spectrogram in audio signal processing here's a link that helped me get some context.

On searching for associated torchaudio Ops, I found that there is a MelScale Op in torchaudio.transforms that converts an STFT to a mel scaled STFT ( which is sort of what the MelWeightMatrix is supposed to do after we create it). Looking at the source for MelScale, it's a matmul (+ some transposes to align everything) between the input STFT with the melscale_fbanks matrix which is what we expect from onnx.MelWeightMatrix

So there is almost a 1:1 mapping from onnx.MelWeightMatrix to melscale_fbanks

onnx torch Comment
num_mel_bins n_mels
dft_length n_freqs n_freqs will need to passed as floor(dft_length/2)+1 as that is the expected output size
sample_rate sample_rate
lower_edge_hertz f_min both in Hz
upper_edge_hertz f_max both in Hz
- norm use torch default
- mel_scale use torch default
attr: output_type - depends on input so if attr is set in onnx would need to be handled explicitly

One thing to note when actually using in a model is that onnx.MelWeightMatrix requires :

In the returned matrix, all the triangles (filterbanks) have a peak value of 1.0.

Whereas melscale_fbanks notes that :

For the sake of the numerical compatibility with librosa, not all the coefficients in the resulting filter bank has magnitude of 1.

PhaneeshB commented 4 months ago

This PR -> https://github.com/llvm/torch-mlir/pull/3503 Adds support for Onnx to Torch Lowering for the op

Pending TorchToLinalg Support