teticio / audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
GNU General Public License v3.0
707 stars 69 forks source link

teticio/audio-diffusion-256 is really good #37

Closed GODGANG4885 closed 1 year ago

GODGANG4885 commented 1 year ago

The results of the model trained on the teticio/audio-diffusion-256 data are very good. I am curious about this dataset. What is the final data size of this data? If you cut 20,000 songs into 5 seconds each, that's a much larger scale than 20,000 songs. If the final number of 5-second pieces of data is 20000, how many original music were used?

Second, in the script file provided by the author, it was clipped to 10 seconds, but why did this dataset clip to 5 seconds for training? Does clipping at 5 seconds perform better than clipping at 10 seconds?

teticio commented 1 year ago

Hi! Thanks for the comment!

There were about 450 songs in the set, which worked out at 20,000 5 second slices. The size of the slice depends on the resolution of the mel spectrogram, which depends on the parameters you give it (and determines aspects of the quality).