Open LPY1219 opened 2 months ago
Thank you for your question, the classifier free guidance is not necessary during fine-tuning since we have frozen pre-trained layer (cross-attention, spatial attention, residual layers) from Stable Diffusion backbone, which has processed masked text tokens. Therefore, the fine-tuned Seer can understand null text tokens when using classifier-free guidance during inference without additional training.
Hi, Mr Gu, I am trying your code. I notice that you did not do any mask on the text in your trainging code, but you did classifier free guidance in your eval code. I want to know if you did this on purpose. Look forward to your reply.