keunwoochoi / torchaudio-contrib

A test bed for updates and new features | pytorch/audio
169 stars 22 forks source link

Functional interface #35

Closed f0k closed 5 years ago

f0k commented 5 years ago

In https://github.com/keunwoochoi/torchaudio-contrib/pull/34#discussion_r271428800, we discussed what purpose we actually see for the functional interface.

My stance would be to follow torch.nn.functional: It provides the operations that the modules in torch.nn are made of, in order to make it easier to reuse implementations for new modules. Any operators needed for the operations are provided by the modules. For example, torch.nn.functional.conv2d expects the inputs and weights to be given, and torch.nn.Conv2d creates the weights in its constructor and passes them to torch.nn.functional.conv2d in its forward() method. To give an example for our package, torch.stft is a function that expects the input and window to be given, and torchaudio.STFT would create the window in its constructor and pass it to torch.stft in the forward pass. If we create our own torchaudio.functional.stft to add some functionality (such as supporting higher-dimensional tensors), that function would still expect a window to be given, and not create it. Same for the mel filterbank: The functional interface for applying a mel filterbank would not create the mel filterbank, but expect it to be given as an argument. The corresponding class creates the filterbank in its constructor and passes it to the function.

Right now, the functional interface deviates from that. It's designed such that stft(x, *args) is the same as STFT(*args)(x). Was this a conscious choice, and why?

ksanjeevan commented 5 years ago

I think it was a conscious choice (as can be seen in the README) based on the cases where a user wants to compute say the spectrogram outside of a module without having to do spec = Spectrogram(*args).cuda()(x) (and the overhead of creating a module just for an operation). At least I think this is how the conversation went, @faroit and @keunwoochoi if I'm missing something!

After giving it more thought and discussion I tend to agree that it would make more sense to follow the torch.nn approach, and if someone has the use case above then they can create the window/filterbank/whatever and pass it? Although that could end up being quite a bit of composition depending on the transform...

...A combination of these approaches would have a spectrogram(x, window, *args), melspectrogram(x, window, melfilterbank, *args) that handle the composition for the functionals while still requiring the parameters to be passed but that seems a bit much?

keunwoochoi commented 5 years ago

Hmm, ok I'm quite sold. I'd summarize

If there are some parameters to be initialized --> Layers, otherwise, Functional

with a nice example of how to create windows in the docstrings.

On composition - @ksanjeevan I thought it's only a matter for Layers?

ksanjeevan commented 5 years ago

On composition - @ksanjeevan I thought it's only a matter for Layers?

Yes definitely, I just meant that if the functional.melspectrogram(x, *args) had to also compute it then it was also doing composition (of functionals, and not modules).

with a nice example of how to create windows in the docstrings.

👍

f0k commented 5 years ago

I think it was a conscious choice (as can be seen in the README) based on the cases where a user wants to compute say the spectrogram outside of a module without having to do spec = Spectrogram(*args).cuda()(x) (and the overhead of creating a module just for an operation).

I see, for interactive exploration sessions, functions may be useful. But if the function created the window, they might not be aware of the overhead of creating a window in every call. A second use of the functional interface in PyTorch is in defining models. Some examples only create all learnable layers in the constructor of their Module, and then use the functional interface for nonlinearities and pooling in the forward() method. If the stft function creates the window, people might use it directly in the forward method and not know that they should have created the window in the constructor.

with a nice example of how to create windows in the docstrings

Yes, that should settle it! The stft function wouldn't create the window by itself, but include an example that applies a windowed stft to a signal, and maybe has a "see also" for the window function (if someone wants to play with intersphinx).

if the functional.melspectrogram(x, *args) had to also compute it then it was also doing composition (of functionals, and not modules)

Yes, that's what I suggested in #20, but maybe we don't need this. The apply_filterbank function (or whatever we call it) can have an example in the docstring that combines it with an stft and mel filterbank. If it turns out that people want to use the functional interface a lot, we can still add shortcuts later. /edit: Or if there are functions that we think could at some point be implemented more efficiently than by composing two functions. For example, a magnitude spectrogram could be computed more efficiently than by applying a real-to-complex STFT and computing the norm afterwards. If we provide a function that encapsulates computing a magnitude spectrogram, then anybody using it will profit from an updated implementation, should there be one. If we only provide the components, then anybody using them would need to change their code to profit from an updated implementation.

keunwoochoi commented 5 years ago

👍 Agreed, thanks for letting us know about those.

If it turns out that people want to use the functional interface a lot, we can still add shortcuts later. ... If we provide a function that encapsulates computing a magnitude spectrogram, then anybody using it will profit from an updated implementation

This is very true!

ksanjeevan commented 5 years ago

Ok sounds like we're aligned on this :+1: . I'll wait for the other current issues/discussions to come to a resolution and then update the PR!

keunwoochoi commented 5 years ago

Seems like we agreed with the suggestion. All the current implementation follows this, and we'll keep the new PR's to be the same.