luosiallen / Diff-Foley

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Apache License 2.0
147 stars 15 forks source link

Questions about the pretrained params #1

Closed auzxb closed 1 year ago

auzxb commented 1 year ago

Thank you for your work. May I ask if you have retrained the parameters of PANN and SlowOnly? The configuration mentioned in the paper seems to be not entirely consistent with the configurations of these two pretrained models.

luosiallen commented 1 year ago

We did not retrained the PANN and Slowonly. We directly use the pretrained weight of PANN and SlowOnly to initalize the CLAP model training. We also introduce some additional trainable projection layers to satisfy our configuration mentioned in the paper.