keunwoochoi / torchaudio-contrib

A test bed for updates and new features | pytorch/audio
169 stars 22 forks source link

SpecAugment #38

Open keunwoochoi opened 5 years ago

keunwoochoi commented 5 years ago

https://github.com/zcaceres/spec_augment

This is obviously what we'd be interested in. cc' @zcaceres

ksanjeevan commented 5 years ago

We could add the PitchShift and TimeStretch on gpu here too?

keunwoochoi commented 5 years ago

If we go for #37's way, yes definitely - and it was even faster than on-CPU, right?

zcaceres commented 5 years ago

thanks for the mention!

Not to be promotional but have you taken a look at fastai-audio ? I wrote a bunch of GPU-based data augmentations there w/Torchaudio.

ksanjeevan commented 5 years ago

TimeStretch benchmark showed faster than librosa on cpu (and way faster on gpu).

keunwoochoi commented 5 years ago

@zcaceres Oh, only heard of it, looks great! Seems like a bunch of great motivations. I'll review the notebooks by myself soon and would love to hear from you about anything good to add here :)

zcaceres commented 5 years ago

@keunwoochoi the notebooks are a bit of a mess so reader beware! within the next two weeks I'll be polishing them up and exporting the best code to a usable module.

the idea was to have a full audio pipeline from raw files --> transforms with GPU --> working classifier & working sequence (RNN) model where you can easily operate on raw signal or an image representation (spectro) of the signal. We are essentially there with functionality, but needs some polish.