showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.15k stars 377 forks source link

Question about training loss. #86

Open Guanys-dar opened 10 months ago

Guanys-dar commented 10 months ago

Thank you for your excellent work, which has been very inspiring to me. 

I have some questions about the loss function used for fine-tuning your network in the context of your paper. In the paper, you mentioned using 'the same training objective in standard LDMs' during fine-tuning. However, in Figure 4 of the paper, it is stated that the network uses a pixel-wise reconstruction loss, which seems to compute based on the input video and the reconstructed video instead of the predicted noise. Could you please clarify if I am misunderstanding something?

henbucuoshanghai commented 4 months ago

他的finetune网络估计就是这样训练的