neurostatslab / vocalocator

Deep neural networks for sound source localization and vocalization attribution.
MIT License
2 stars 0 forks source link

Reimplement data-augmentations #22

Closed Aramist closed 1 year ago

Aramist commented 1 year ago

In progress branch to test data augmentations:

resolves #20

Aramist commented 1 year ago

So far, augmentations have been implemented and run on my machine using the pipenv setup housed within the repo. Haven't been able to test it on a blank-slate python installation. What's next?

  1. Test installation on a fresh python installation (module purge; module load python; on cluster without conda initialized)
  2. Sweep over different combinations of augmentations and their strengths to determine which help or hurt performance.
Aramist commented 1 year ago

Folding in a change to dataloaders referenced in #25 because doing the augmentations unbatched in numpy has proved to be very slow. Pending: profiling results

Aramist commented 1 year ago

I've determined that the inefficiency is not due to the type of dataset being used, as both kinds are parallelized by Torch, but instead the augmentations running on the cpu. Running the current commit on a small dataset (finetune_gpup) with cpu-bound augmentations, the first epoch takes ~70s. Of these, >60s are spent in the dataloader and <4 are spent in all the forward and backward passes. After switching back to torch-audiomentations, running the augmentations on GPU, giving up on pitch shift, and designing my own masking module, a model running all augmentations (inversion, noise, masking) only spends 0.5 seconds within the Dataloader throughout the train portion of an epoch when the dataset is on the machine's local drive.

Aramist commented 1 year ago

TODO: Modify shell scripts to use pipenv

Aramist commented 1 year ago

TODOs before review and merge:

Aramist commented 1 year ago