Closed zh217 closed 5 years ago
@zh217 can we have small code snippet of the load/save?
Sure.
from torchaudio.transforms import MelSpectrogram
ms = MelSpectrogram()
ms(torch.randn(1, 1000))
ms2 = MelSpectrogram()
ms2.load_state_dict(ms.state_dict())
Thanks for the snippet 😄! Yes I am not sure why that is that way:
https://github.com/pytorch/audio/pull/72/files#diff-a3428ec41790ed12faea36fcc6dd8f06R228
The PR says that they do not want to reconstruct the fb matrix every time __call__
is used so they initialize it in __init__
or for the first input.
We can remove lazy initialization by forcing n_stft
to be required so that the fb matrix is always created in __init__
(the user must know the dimensions of their matrix). Then for the __call__
, it just checks that the input dimensions match expectations and does a matrix multiplication.
Thoughts? @zh217 @dhpollack @vincentqb @cpuhrsch
Making n_fft
required sounds reasonable to me. I can't think of a use case where I need n_fft
to be dynamically determined at runtime. Even if there were such a use case, the current implementation doesn't work with changing n_fft
anyway.
The MelSpectrogram
could derive n_stft
from n_fft
https://github.com/pytorch/audio/blob/master/torchaudio/transforms.py#L202
self.mel_scale = MelScale(self.n_mels, self.sample_rate, self.f_min,
self.f_max, n_stft=self.n_fft // 2 + 1)
which would make your code snippet run correctly
ms = MelSpectrogram()
ms(torch.randn(1, 1000))
ms2 = MelSpectrogram()
ms2.load_state_dict(ms.state_dict())
We could keep also n_stft
optional for loading/saving by specifying the shape of the destination module (ms2
) for MelScale
ms = MelScale()
ms(torch.randn(1, 1000, 100))
ms2 = MelScale(n_stft=1000)
ms2.load_state_dict(ms.state_dict())
The lazy evaluation means that we do not have to specify any dimensions when defining MelScale, and they are then inferred at evaluation time against the matrix used, is that right? This would seem like a nice feature to have for MelScale.
However, for MelSpectrogram, we already know the size of the matrix at initialization, and so we could infer the size at that time. Is that correct? If this is so, then we could go for that as a default behavior since waiting does not give us more information.
Going to close this as from #246, we can now load from state_dict for MelSpectrogram in this code snippet https://github.com/pytorch/audio/issues/245#issuecomment-522602035
Also, we can load from state_dict for MelScale (see test_melscale_load_save and test_melspectrogram_load_save)
Concluded that inferring matrix size at runtime for MelScale (i.e. lazy initialization) is a nice property to have as long as it does not impede other use cases. Feel free to reopen if we did not address anything
Lazy initialization refers to the declaration here and the initialization here. When loading saved model (saved with torch.save()), this causes problem:
The problem can be sidestepped by running the model at least once before loading weights. But I'd rather not have to rely on such tricks.
Why is this being lazily initialized anyway?