teticio / audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
GNU General Public License v3.0
707 stars 69 forks source link

Whether the longer music sample is the repetition of a shorted sample? #28

Closed gandolfxu closed 1 year ago

gandolfxu commented 1 year ago

I have found that the generated music sample repeats every few seconds. Does the loop mean repetition?

Can this model generate natural long music?

teticio commented 1 year ago

Each music sample corresponds to an image, so the length of the sample can be thought of as the x-resolution. This is limited by the GPU memory you have available. As it stands it is limite to short samples of a few seconds, but there are ways to stitch these samples together, which are explored in the sample notebooks. The idea of this repo was to show what could be done with a single commercial grade GPU: hopefully someone with access to much more compute power can do something more impressive.