teticio/audio-diffusion-256 is really good

teticio / audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.

GNU General Public License v3.0

707 stars 69 forks source link

The results of the model trained on the teticio/audio-diffusion-256 data are very good. I am curious about this dataset. What is the final data size of this data? If you cut 20,000 songs into 5 seconds each, that's a much larger scale than 20,000 songs. If the final number of 5-second pieces of data is 20000, how many original music were used?

Second, in the script file provided by the author, it was clipped to 10 seconds, but why did this dataset clip to 5 seconds for training? Does clipping at 5 seconds perform better than clipping at 10 seconds?

teticio / audio-diffusion

teticio/audio-diffusion-256 is really good #37