teticio / audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
GNU General Public License v3.0
707 stars 69 forks source link

training notebook #14

Closed GeorvityLabs closed 1 year ago

GeorvityLabs commented 1 year ago

@teticio ,

it would be amazing if you could make a training colab notebook. The way users could upload their samples and get it trained using colab t4 gpu.

Also how many samples would you recommend to get good results. Say for eg :

if a user is training on 1s audio clips of different bird chirps , how many audio samples would be needed as input in that case for training using DDPM. Have you done any experiments.

What if we have only less number of audio files say 10 wav files of 2 s each. would that work? If so , how many epochs should one train for.

In case you make a training notebook . I hope you mention the recommended number of samples and training epochs in the notebook instructions.

GeorvityLabs commented 1 year ago

@teticio , i followed your instructions to try training on colab. It completes epoch 0 upto 50 steps , but after that the following error pops up

Epoch 0: 100% 50/50 [00:41<00:00, 1.21it/s, ema_decay=0.946, loss=0.106, lr=1e-5, step=50] Traceback (most recent call last): File "scripts/train_unconditional.py", line 381, in <module> main(args) File "scripts/train_unconditional.py", line 284, in main batch_size=args.eval_batch_size, File "/usr/local/lib/python3.7/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/usr/local/lib/python3.7/dist-packages/audiodiffusion/__init__.py", line 256, in __call__ self.progress_bar(self.scheduler.timesteps[start_step:])): AttributeError: 'AudioDiffusionPipeline' object has no attribute 'progress_bar' Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 859]]. ERROR:huggingface_hub.repository:Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 859]].

The error was AttributeError: 'AudioDiffusionPipeline' object has no attribute 'progress_bar'

I've attached an screenshot of the error below.

Hope you could suggest a fix for the same. Screenshot from 2022-11-02 16-54-33

teticio commented 1 year ago

Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh

GeorvityLabs commented 1 year ago

Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh

was using the one that was in the requirements diffusers>=0.2.4

GeorvityLabs commented 1 year ago

Thanks for bringing this to my attention. What version of Diffusers are you using? I just updated to the latest one and it broke in a different way sigh

could you make a training colab notebook , it would be great way to get started. since only the inference notebook is currently available.

teticio commented 1 year ago

I will look into making a colab notebook for training.

As you can see in requirements, the version will be greater than or equal to that. Depending on when you installed it, it will be one version or another. Can you do pip list and note the version? Also, it would help to know the command line arguments you used that lead to an error .

GeorvityLabs commented 1 year ago

I will look into making a colab notebook for training.

As you can see in requirements, the version will be greater than or equal to that. Depending on when you installed it, it will be one version or another. Can you do pip list and note the version? Also, it would help to know the command line arguments you used that lead to an error .

cool , looking forward to the notebook. I think it was 0.2.4 in the requirements.txt

teticio commented 1 year ago

You need a more recent version. I have updated the requirements.txt and setup.cfg files to require a version >= 0.4.1. It should work with the latest version of diffusers so just do pip install --upgrade diffusers and let me know if that works for you. Thanks!

GeorvityLabs commented 1 year ago

You need a more recent version. I have updated the requirements.txt and setup.cfg files to require a version >= 0.4.1. It should work with the latest version of diffusers so just do pip install --upgrade diffusers and let me know if that works for you. Thanks!

yes, this works. I removed the version specification , and just used pip install diffusers , so now it is working. def looking forward to your notebook as well , maybe you can mention recommended number of input samples , probably you can also include a script at the end of the training notebook where users can generate .wav sample outputs from their trained model.

GeorvityLabs commented 1 year ago

I just reach epoch 99 , then this happens :

Several commits (11) will be pushed upstream. WARNING:huggingface_hub.repository:Several commits (11) will be pushed upstream. 100% 1000/1000 [04:03<00:00, 4.10it/s] Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]]. ERROR:huggingface_hub.repository:Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]]. Waiting for the following commands to finish before shutting down: [[push command, status code: running, in progress. PID: 866], [push command, status code: running, in progress. PID: 1113], [push command, status code: running, in progress. PID: 1336], [push command, status code: running, in progress. PID: 1493], [push command, status code: running, in progress. PID: 1624], [push command, status code: running, in progress. PID: 1751], [push command, status code: running, in progress. PID: 1882], [push command, status code: running, in progress. PID: 2013], [push command, status code: running, in progress. PID: 2144], [push command, status code: running, in progress. PID: 2279], [push command, status code: running, in progress. PID: 2409]].

Screenshot from 2022-11-02 20-08-03

Any idea why this happens @teticio

teticio commented 1 year ago

Best to open separate issues, but I will answer. It is pushing your model checkpoints to huggingface. It should work, but maybe you have a slow internet connection? Or perhaps there is some issue pushing from Colab? You can run the models without pushing to the hub in local. I'll try to replicate in the meantime.

I added a notebook for you https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb

If you are doing short samples (like 1 sec or so), you should change the resolution to something 64,256 (not all resolutions work, best to be powers of 2).

GeorvityLabs commented 1 year ago

@teticio thanks for the clarification.

btw, the notebook you linked is an inference notebook right?

GeorvityLabs commented 1 year ago

Best to open separate issues, but I will answer. It is pushing your model checkpoints to huggingface. It should work, but maybe you have a slow internet connection? Or perhaps there is some issue pushing from Colab? You can run the models without pushing to the hub in local. I'll try to replicate in the meantime.

I added a notebook for you https://colab.research.google.com/github/teticio/audio-diffusion/blob/master/notebooks/test_model.ipynb

If you are doing short samples (like 1 sec or so), you should change the resolution to something 64,256 (not all resolutions work, best to be powers of 2).

I found the training notebook in the folder inside the notebooks folder in the repo. But I guess you linked another notebook by mistake in your comment above.

GeorvityLabs commented 1 year ago

btw @teticio ,

if I have 100 - 1s wav files , I only have to train it for 10 epochs using ddim? would that be enough , or is there any formula that relates epochs to number of audio samples (and audio length)

GeorvityLabs commented 1 year ago

@teticio , i just heard the generation after training for 10 epochs on 100 - 1s audio clips , it was just noise. Now , I've changed the number of epochs to 100 and am retraining the model, do you have any other recommendations .

teticio commented 1 year ago

You have to play around with it I'm afraid. It depends on many factors. I only have experience of training with 20,000 - 30,000 music samples of 5 seconds, of music that is relatively homogenous. For that it took a week on a RTX 2080 Ti GPU. So on Colab with a 12 hour limit(?) you are going to be limited. Btw, I wasn't able to push to the hub from Colab, maybe you have had more luck.

GeorvityLabs commented 1 year ago

You have to play around with it I'm afraid. It depends on many factors. I only have experience of training with 20,000 - 30,000 music samples of 5 seconds, of music that is relatively homogenous. For that it took a week on a RTX 2080 Ti GPU. So on Colab with a 12 hour limit(?) you are going to be limited. Btw, I wasn't able to push to the hub from Colab, maybe you have had more luck.

That is interesting. How many epochs did you train the 20k 5s samples for? Also , do you have any suggestions on how to denoise the generated output , since noise gets added to empty spaces , did you ever look into anything in that direction?

I was able to push to hugginface by doing the following :

First I cd into the model folder , then I run the following commands

!git lfs install !git add . !git lfs migrate import --everything !git commit -m "initial commit" !git push origin main --force

This allowed me to successfully push the model via colab.

Also , I have a local gpu which I use for model training via jupyter notebook.

teticio commented 1 year ago

So I did 100 epochs (you can see the tensorboard here https://huggingface.co/teticio/audio-diffusion-256/tensorboard. After 50 it was pretty good, but it did continue to improve.

Thanks for the info on pushing - that is very useful to know.

GeorvityLabs commented 1 year ago

@teticio , that's good to know.

I was going through the training notebook. It generates the image and audio.

I was able to save the image to local path using image.save() but , how can i save the audio to a local path within the python script as a .wav file?

for saving the image i used :

image.save("filename.jpg", 'JPEG')

the audio is displayed using display(Audio(audio, rate=sample_rate))

to save it i tried (from scipy.io.wavfile import write)

write('test.wav', sample_rate, audio)

but that saves a file which is very low in volume , do you have a fix to save it properly as a .wav in local path.

teticio commented 1 year ago

The easiest way is to click on the three dots on the audio widget and download it from there.

GeorvityLabs commented 1 year ago

The easiest way is to click on the three doors on the audio widget and download it from there.

yea. I wanted to write it in the script , because , if i'm generating 100 files , then clicking download 100 times would be a hassle.

So , do you have any work around for the same. maybe some other way to save as .wav

teticio commented 1 year ago

I don't have one of the top of my head, I'm sure you can figure it out searching for a way using , say, librosa

GeorvityLabs commented 1 year ago

@teticio ,

since the mel inputs here are grayscale. I was wondering if you have implemented any new ideas in this repo compared to the original DDPM implementation in diffusers which is general purpose (for color images too) .

Since , in case of mel we only use grayscale input and expect only grayscale to be generated , did you make any changes to any parameters or techniques so that DDPM works better with grayscale image training?

If so , I'd love to know some of the novelties you and the team have implemented in this repo.

GeorvityLabs commented 1 year ago

@teticio ,

did you implement any functions to prevent waste of computation? because , here you are only training grayscale inputs , so you might have changed things from the usual DDPM implementation which is more focused on natural color images which are more complex.

compared to those natural images, aren't mel spectrograms in greyscale more simple images , in terms of features etc?

so wanted to know if you have added any unique implementations in this repo when compared to vanilla DDPMs , if so , hope you can mention them here.

teticio commented 1 year ago

Team haha - it's just me really.

I made adaptations to use grayscale (1 channel) as opposed to colour (3 channels). This was relativley straightforward in train_unconditional.py - the main changes involved replacing 3 with 1 and 'RGB' with 'L'. The train_vase.py took a bit more work - the chnages there can be found searching for similar terms, as well as for channels and, most importantly, I changed the config file ldm_autoenconder_kl.yaml compared to the version I got from the Stable Diffusion repo.

GeorvityLabs commented 1 year ago

@teticio , that is interesting to hear. So L is the lightness in the image right?

could you go into a bit more detail regarding the changes made in train_vase.pyand ldm_autoencoder_kl.yaml , and maybe explain as to what motivated to make those changes and how it affects the training process positively.

teticio commented 1 year ago

L = mode for grayscale, as opposed to RGB

I think the best think to do is that you do diff against the files that I used as a starting point https://github.com/huggingface/diffusers/blob/main/examples/train_unconditional.py, https://github.com/CompVis/stable-diffusion/blob/main/configs/autoencoder/autoencoder_kl_32x32x4.yaml and https://github.com/CompVis/stable-diffusion/blob/main/main.py (for train_vae.py).

I found the training speed and convergence was faster and the GPU memory requirement less.