teticio / audio-diffusion

Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
GNU General Public License v3.0
707 stars 69 forks source link

Recommended training hyperparameters for 44.1Khz & 48Khz Samplerate #22

Closed moiseshorta closed 1 year ago

moiseshorta commented 1 year ago

Hi,

Thanks for the great repository and code, been having some fun training some models with it using the default parameters.

I'm trying to experiment with higher samplerates, such as 44.1Khz and 48Khz.

What are some other configurations in hop_length, n_fft would be needed to achieve good results?

Thanks for the tips!

teticio commented 1 year ago

I found that

"hop_length": 512,
"n_fft": 2000,

worked well for higher sample rates or n_mels. Otherwise I noticed a "whistling" artefact. In any case, use the test_mel notebook to test a full round trip (audio->mel->audio) before wasting time on creating a dataset or training a model.

moiseshorta commented 1 year ago

Thank you! I will try those settings.

I recently trained a model at 44.1Khz (256x256) with these settings and achieved good results, albeit with slower training time:

n_fft=4096 hop_length=512