Merging items torchaudio-contrib that are not yet in torchaudio

pytorch / audio

Data manipulation and transformation for audio signal processing, powered by PyTorch

https://pytorch.org/audio

BSD 2-Clause "Simplified" License

2.5k stars 644 forks source link

Merging items torchaudio-contrib that are not yet in torchaudio #259

Open vincentqb opened 5 years ago

vincentqb commented 5 years ago

When looking into torchaudio-contrib, I see the following items that are not into torchaudio:

[ ] Harmonic-Percussive Source Separation
[ ] DB to Amplitude functional and layer (though we have Amplitude to DB)
[ ] mel-hertz functionals
[ ] ~Complex Norm layer (in discussion in pytorch/pytorch#755)~
[ ] Time stretch layer but phase vocoder functional is available

See

torchaudio-contrib merging plans in #110
keunwoochoi/torchaudio-contrib#61
keunwoochoi/torchaudio-contrib#71
sox discussion in #260.

keunwoochoi commented 5 years ago

i'm thinking of dB-to-amplitude and HPSS. @ksanjeevan Hey, would you be interested in time stretch layer or any other augmentation layers?

ksanjeevan commented 5 years ago

@keunwoochoi I think as far as augmentation, Time Stretching and Pitch Shifting layers would be great. They would both be based on the phase_vocoder which we have as well as the sinc resampling for the latter (which we also have!). Pitch Shifting (following librosa) might be trickier since it can't be directly applied to the spectrogram, but with some discussion we could get something working.

I had also implemented Time/Frequency Masking as described by SpecAugment which I think would be interesting? These four could be a good start.

vincentqb commented 5 years ago

I had also implemented Time/Frequency Masking as described by SpecAugment which I think would be interesting? These four could be a good start.

I want to make sure that torchaudio has what you need if you want to open pull requests. :) Is there something missing in torchaudio available in torchaudio-contrib that would prevent you from doing so?

keunwoochoi commented 5 years ago

+1 for making a PR directly whenever it's an option.

ksanjeevan commented 5 years ago

@vincentqb going by #37 I was thinking of doing a PR for all 4 to torchaudio-contrib first and making sure it was liked/tested well? But yeah nothing would be blocking, so I can do it directly given that it's SpecAugment.

keunwoochoi commented 5 years ago

Right that’s how we planed. But I guess we can be more flexible though? At the moment I don’t have a strong opinion about doing it here first :)

vincentqb commented 5 years ago

We're also flexible here, and want to be compatible with your workflow. We're really excited about your work, so we want to make sure you can have as much impact as you want :)

ksanjeevan commented 5 years ago

@vincentqb great! So I'll make a pull request with the four above (time stretch, pitch shift, time/freq masking) and we can all have a discussion in the the context of the PR for changes like where to put them, pitch shift stuff, etc. 👍

keunwoochoi commented 5 years ago

@ksanjeevan Perfect, that'd be great!