Open vincentqb opened 5 years ago
i'm thinking of dB-to-amplitude and HPSS. @ksanjeevan Hey, would you be interested in time stretch layer or any other augmentation layers?
@keunwoochoi I think as far as augmentation, Time Stretching and Pitch Shifting layers would be great. They would both be based on the phase_vocoder
which we have as well as the sinc resampling for the latter (which we also have!). Pitch Shifting (following librosa) might be trickier since it can't be directly applied to the spectrogram, but with some discussion we could get something working.
I had also implemented Time/Frequency Masking as described by SpecAugment which I think would be interesting? These four could be a good start.
I had also implemented Time/Frequency Masking as described by SpecAugment which I think would be interesting? These four could be a good start.
I want to make sure that torchaudio has what you need if you want to open pull requests. :) Is there something missing in torchaudio available in torchaudio-contrib that would prevent you from doing so?
+1 for making a PR directly whenever it's an option.
@vincentqb going by #37 I was thinking of doing a PR for all 4 to torchaudio-contrib first and making sure it was liked/tested well? But yeah nothing would be blocking, so I can do it directly given that it's SpecAugment.
Right that’s how we planed. But I guess we can be more flexible though? At the moment I don’t have a strong opinion about doing it here first :)
We're also flexible here, and want to be compatible with your workflow. We're really excited about your work, so we want to make sure you can have as much impact as you want :)
@vincentqb great! So I'll make a pull request with the four above (time stretch, pitch shift, time/freq masking) and we can all have a discussion in the the context of the PR for changes like where to put them, pitch shift stuff, etc. 👍
@ksanjeevan Perfect, that'd be great!
When looking into torchaudio-contrib, I see the following items that are not into torchaudio:
See