keunwoochoi / torchaudio-contrib

A test bed for updates and new features | pytorch/audio
169 stars 22 forks source link

API: Composition vs. Inheritance #20

Closed f0k closed 5 years ago

f0k commented 5 years ago

The API currently favors inheritance over composition. For example, Melspectrogram extends Spectrogram, and melspectrogram calls spectrogram. In a similar way, MFCC would extend Melspectrogram.

This means, for example, that one cannot reuse the Melspectrogram implementation to work on a custom STFT class. I don't have a specific use case in mind, just a sense of a missed opportunity.

The alternative would be to have the STFT class, a ComplexMagnitude class, a MelFilterbank class (to be derived from a generic Filterbank class with corresponding defaults), and a DCT class, and have convenience functions such as spectrogram and mfcc that return a nn.Sequential composition of the models.

What do you think?

keunwoochoi commented 5 years ago

I think that's a great idea, actually that's also closer to our original idea - to have MelFilterbank class but I didn't come up with composing them. Which sounds even better.

faroit commented 5 years ago

yes, we discussed this before but didn't think that far ;-) I like your alternative as it would make separation/spectogram filtering tasks more elegant such as

X = stft(x)
Y_hat = model(spectrogram(X))

have convenience functions such as spectrogram and mfcc that return a nn.Sequential composition of the models.

yes, sounds really nice, indeed. However it would add a lot of overhead, when eg. spectrogram(X) just would return the modulus, no?

ksanjeevan commented 5 years ago

Cool! having composition would also make adding the phase vocoder easier.

f0k commented 5 years ago

I like your alternative as it would make separation/spectogram filtering tasks more elegant

Ah yes, of course, that's a good use case. Sometimes you will want to have both the complex and the magnitude spectrogram, or both the full and the mel-scaled spectrogram, so it's good to have the transformations available separately.

X = stft(x)
Y_hat = model(spectrogram(X))

I assume you're talking of the functional interface now. It would still be X = spectrogram(x), but also enable X = magnitude(stft(x)) or X = power(stft(x)).

For the nn.Module interface, we would have convenience functions like:

def melspectrogram(all, the, parameters):
    return nn.Sequential([STFT(some, parameters), Magnitude(), MelFilterbank(some, other, parameters)])

Or maybe:

def melspectrogram(all, the, parameters):
    return nn.Sequential([spectrogram(some, parameters), MelFilterbank(some, other, parameters)])

Hmm, the compositions will become a bit deep when they each wrap one more nn.Sequential. I guess I'm still thinking too much in terms of Lasagne :) So maybe it should be:

def melspectrogram(all, the, parameters):
    result = spectrogram(some, parameters)
    result.add_module(MelFilterbank(some, other, parameters))
    return result

(It's a pity that add_module() does not return self.)