keunwoochoi / torchaudio-contrib

A test bed for updates and new features | pytorch/audio
169 stars 22 forks source link

Downmix - api design #54

Open keunwoochoi opened 5 years ago

keunwoochoi commented 5 years ago

Regarding #45 . Ok my quick suggestion.

Any thought?

faroit commented 5 years ago

i like that

keunwoochoi commented 5 years ago

good!

hagenw commented 5 years ago

I would also vote for this.

Two small remarks:

keunwoochoi commented 5 years ago

the current naming behavior (e.g. StretchSpecTime)

Actually, shouldn't that be also TimeSpecStretch and hypotheticallyTimeStretch, SpecStretch? I think they'd be aligned better with other names like DbToAmplitude, MuLawEncoding as well as read more naturally. (cc' @ksanjeevan )

mean across them

Totally agreed. Didn't really mean sum > mean.

faroit commented 5 years ago

I would again propose to offer a mono downmix option in the Spectrogram layer. My feeling is that by just having this option people would less often do a downmix in the time domain which does not preserve the energy and can lead to inferior processing results when then model is operating in the single channel domain but input and output is stereo (source separation, denoising etc).

keunwoochoi commented 5 years ago

less often do a downmix in the time domain

@faroit That's a good point, except.. maybe a new issue for that topic? :^P haha

hagenw commented 5 years ago

I just looked for two random PyTorch examples and found: SyncBatchNorm and DataLoader which are both classes. That looks not that consistent to me ;) If you look at os in the Python standard library you find things like mkdir, waitpid, unsetenv. So this would be verb + noun.

At the moment I would vote for DownmixSpectrum and StrectSpecTime. Otherwise SpectrumDownmixer and SpecTimeStretcher would probably be more consitent with PyTorch than SpectrumDownmix and SpecTimeStretch.

But I don't have a strong opinion at the moment and will continue to look for other examples.

keunwoochoi commented 5 years ago

OK, I asked an anonymous native English speaker and music researcher. We go with DownmixWaveform and DownmixSpectrum!

ksanjeevan commented 5 years ago

OK, I asked an anonymous native English speaker and music researcher. We go with DownmixWaveform and DownmixSpectrum!

👍

I like StretchSpecTime since, to me, it reads more intuitively (i.e. "stretch the spectrogram in time dimension", it's like this sentence in variable name :). That said, TimeStretch or SpecStretch also work since the time stretching with phase vocoder has to use the complex spectrogram, so there shouldn't be much ambiguity and it's more compact naming.

keunwoochoi commented 5 years ago

Ok. Hmm, if it's always on spectrogram, that information is actually included in the argument name. And I actually misunderstood it does it along both time and spectral axis (which would be still a weird name though). One can argue that that's same for downmix layers but in that case we have to specify so probably a different story.

This issue was for naming of downmix but let's just do it here quickly. I'd vote for TimeStretch just because it is such a popular name (like BatchNorm). Later, if we have a pitch shifter, I'd vote for, well, PitchShift for the same reason.

f0k commented 5 years ago

Two remarks: