happylittlecat2333 / Auffusion

Official codes and models of the paper "Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation"
https://auffusion.github.io/
Other
160 stars 13 forks source link

About pre-trained VAE #9

Open kaiw7 opened 8 months ago

kaiw7 commented 8 months ago

Hi, do you directly use the pre-trained VAE in LDM? Or the VAE is first pre-trained on audio spec? Thank you very much.

happylittlecat2333 commented 8 months ago

Hi, we directly use the pretrained VAE of Stable Diffusion.

kaiw7 commented 8 months ago

Hi, we directly use the pretrained VAE of Stable Diffusion.

Hi, thank you very much for your quick reply. Could I know when you will release the data pre-processing and training scripts? or it would be appreciated if you could tell me how to refer to these two scripts.

kaiw7 commented 8 months ago

Hi, we directly use the pretrained VAE of Stable Diffusion.

Hi, could I know when you will release the complete scripts including model training? Many thanks.

happylittlecat2333 commented 8 months ago

Hi, thanks for your attention! We have decided to release our complete scripts after acceptance. However, our training scripts are adapted from diffusers training scripts, and the data pre-processing scripts can refer to data convert scripts in our project. You may try to adapt these scripts and train your models.

kaiw7 commented 8 months ago

Hi, thanks for your attention! We have decided to release our complete scripts after acceptance. However, our training scripts are adapted from diffusers training scripts, and the data pre-processing scripts can refer to data convert scripts in our project. You may try to adapt these scripts and train your models.

Hi, could you share with me which functions/classes in 'convert scripts' are used for training? I think not all the functions in convert scripts are used.

kaiw7 commented 8 months ago

Hi, thanks for your attention! We have decided to release our complete scripts after acceptance. However, our training scripts are adapted from diffusers training scripts, and the data pre-processing scripts can refer to data convert scripts in our project. You may try to adapt these scripts and train your models.

Hi, could you please release the script about how to obtain the audio mel-spectrogram and normalize it for model training? Because there are many functions so that we don't know which ones are used for preparing the audio data. Many thanks.

twobob commented 2 weeks ago

obtain the audio mel-spectrogram https://github.com/happylittlecat2333/Auffusion/blob/f44233d1d0f6444653606b6189e090e999d79656/converter.py#L177 normalize it for model training https://github.com/happylittlecat2333/Auffusion/blob/f44233d1d0f6444653606b6189e090e999d79656/converter.py#L109