mostafaelaraby / wavegan-pytorch

Pytorch Implementation of wavegan model to generate audio
https://arxiv.org/abs/1802.04208
Apache License 2.0
162 stars 43 forks source link
ai-music gan music-generation pytorch wavegan wavegan-pytorch

WaveGAN v2 Pytorch

Pytorch implementation of WaveGAN , a machine learning algorithm which learns to generate raw audio waveforms.

This is the ported Pytorch implementation of WaveGAN (Donahue et al. 2018) (paper) (demo) (sound examples). WaveGAN is a machine learning algorithm which learns to synthesize raw waveform audio by observing many examples of real audio. WaveGAN is comparable to the popular DCGAN approach (Radford et al. 2016) for learning to generate images.

In this repository, we include an implementation of WaveGAN capable of learning to generate up to 4 seconds of audio at 16kHz.

WaveGAN is capable of learning to synthesize audio in many different sound domains. In the above figure, we visualize real and WaveGAN-generated audio of speech, bird vocalizations, drum sound effects, and piano excerpts. These sound examples and more can be heard here.

Requirements

pip install -r requirements.txt

Datasets

WaveGAN can now be trained on datasets of arbitrary audio files (previously required preprocessing). You can use any folder containing audio, but here are a few example datasets to help you get started:

WaveGan Parameters (params.py)

Samples

Quality considerations

If your results are too noisy, try adding a post-processing filter . You may also want to change the amount of or remove phase shuffle from models.py . Increasing either the model size or filter length from models.py may improve results but will increase training time.

Monitoring

The train script will generate a fixed latent space and save output samples to the output dir specified in the params.

Contributions

This repo is based on chrisdonahue's , jtcramer's implementation and mazzzystar

Attribution

If you use this code in your research, cite via the following BibTeX:

@inproceedings{donahue2019wavegan,
  title={Adversarial Audio Synthesis},
  author={Donahue, Chris and McAuley, Julian and Puckette, Miller},
  booktitle={ICLR},
  year={2019}
}