seervideodiffusion / SeerVideoLDM

[ICLR 2024] Seer: Language Instructed Video Prediction with Latent Diffusion Models
14 stars 2 forks source link

Classifier free guidance #4

Open LPY1219 opened 2 months ago

LPY1219 commented 2 months ago

Hi, Mr Gu, I am trying your code. I notice that you did not do any mask on the text in your trainging code, but you did classifier free guidance in your eval code. I want to know if you did this on purpose. Look forward to your reply.

XianfanGu commented 1 month ago

Thank you for your question, the classifier free guidance is not necessary during fine-tuning since we have frozen pre-trained layer (cross-attention, spatial attention, residual layers) from Stable Diffusion backbone, which has processed masked text tokens. Therefore, the fine-tuned Seer can understand null text tokens when using classifier-free guidance during inference without additional training.