MelWeightMatrix - Githubissues

Onnx Docu

For folks (like me) who are new to Mel Spectrogram in audio signal processing here's a link that helped me get some context.

On searching for associated torchaudio Ops, I found that there is a MelScale Op in torchaudio.transforms that converts an STFT to a mel scaled STFT ( which is sort of what the MelWeightMatrix is supposed to do after we create it). Looking at the source for MelScale, it's a matmul (+ some transposes to align everything) between the input STFT with the melscale_fbanks matrix which is what we expect from onnx.MelWeightMatrix

So there is almost a 1:1 mapping from onnx.MelWeightMatrix to melscale_fbanks

onnx	torch	Comment
num_mel_bins	n_mels
dft_length	n_freqs	n_freqs will need to passed as floor(dft_length/2)+1 as that is the expected output size
sample_rate	sample_rate
lower_edge_hertz	f_min	both in Hz
upper_edge_hertz	f_max	both in Hz
-	norm	use torch default
-	mel_scale	use torch default
attr: output_type	-	depends on input so if attr is set in onnx would need to be handled explicitly

One thing to note when actually using in a model is that onnx.MelWeightMatrix requires :

In the returned matrix, all the triangles (filterbanks) have a peak value of 1.0.

Whereas melscale_fbanks notes that :

For the sake of the numerical compatibility with librosa, not all the coefficients in the resulting filter bank has magnitude of 1.

nod-ai / SHARK-ModelDev

MelWeightMatrix #738