Downmix - api design - Githubissues

keunwoochoi commented 5 years ago

Regarding #45 . Ok my quick suggestion.

SpectrumDownmix(mag_specgrams) which does what's suggested in #45.
WaveformDownmix(waveforms) which simply sums up the input along the channel-axis.

Any thought?

faroit commented 5 years ago

i like that

keunwoochoi commented 5 years ago

good!

hagenw commented 5 years ago

I would also vote for this.

Two small remarks:

To be more in line with the current naming behavior (e.g. StretchSpecTime) I would go with: DownmixSpectrum and DownmixWaveform.
For WaveformDownmix(waveforms) I would expect that it not only sums up all channels, but calculates the mean across them.

keunwoochoi commented 5 years ago

the current naming behavior (e.g. StretchSpecTime)

Actually, shouldn't that be also TimeSpecStretch and hypotheticallyTimeStretch, SpecStretch? I think they'd be aligned better with other names like DbToAmplitude, MuLawEncoding as well as read more naturally. (cc' @ksanjeevan )

mean across them

Totally agreed. Didn't really mean sum > mean.

faroit commented 5 years ago

I would again propose to offer a mono downmix option in the Spectrogram layer. My feeling is that by just having this option people would less often do a downmix in the time domain which does not preserve the energy and can lead to inferior processing results when then model is operating in the single channel domain but input and output is stereo (source separation, denoising etc).

keunwoochoi commented 5 years ago

less often do a downmix in the time domain

@faroit That's a good point, except.. maybe a new issue for that topic? :^P haha

On sum vs mean, I'd conclude we're fine with mean.
On the names spectrum_downmix vs downmix_spectrum, any opinions everyone? The point is which would be more consistent along with other APIs.
- I think we're sort of in favor of readable names such as complex_norm, which would be usually adjective + noun. Hence I think we also should change StretchSpecTime to SpecTimeStretch. Vote vote! @hagenw @f0k @deNsuh @hagenw @ksanjeevan

hagenw commented 5 years ago

I just looked for two random PyTorch examples and found: SyncBatchNorm and DataLoader which are both classes. That looks not that consistent to me ;) If you look at os in the Python standard library you find things like mkdir, waitpid, unsetenv. So this would be verb + noun.

At the moment I would vote for DownmixSpectrum and StrectSpecTime. Otherwise SpectrumDownmixer and SpecTimeStretcher would probably be more consitent with PyTorch than SpectrumDownmix and SpecTimeStretch.

But I don't have a strong opinion at the moment and will continue to look for other examples.

keunwoochoi commented 5 years ago

OK, I asked an anonymous native English speaker and music researcher. We go with DownmixWaveform and DownmixSpectrum!

ksanjeevan commented 5 years ago

OK, I asked an anonymous native English speaker and music researcher. We go with DownmixWaveform and DownmixSpectrum!

👍

I like StretchSpecTime since, to me, it reads more intuitively (i.e. "stretch the spectrogram in time dimension", it's like this sentence in variable name :). That said, TimeStretch or SpecStretch also work since the time stretching with phase vocoder has to use the complex spectrogram, so there shouldn't be much ambiguity and it's more compact naming.

keunwoochoi commented 5 years ago

Ok. Hmm, if it's always on spectrogram, that information is actually included in the argument name. And I actually misunderstood it does it along both time and spectral axis (which would be still a weird name though). One can argue that that's same for downmix layers but in that case we have to specify so probably a different story.

This issue was for naming of downmix but let's just do it here quickly. I'd vote for TimeStretch just because it is such a popular name (like BatchNorm). Later, if we have a pitch shifter, I'd vote for, well, PitchShift for the same reason.

f0k commented 5 years ago

Two remarks:

If we have something that needs a different class for spectrograms and waveforms, then "verb + noun" ensures the two classes appear next to each other in an alphabetic listing. That's sometimes useful.
Will Spectrum be replaced by what we decide to use in #36? E.g., Specgram? I understand that downmixing operates on each spectrum separately, or actually on each bin, but I think we should avoid using too many different terms. Otherwise, for consistency, it should also be DownmixSample instead of DownmixWaveform.

keunwoochoi / torchaudio-contrib

Downmix - api design #54