About training config: unet_use_cross_frame_attention

guoyww / AnimateDiff

Official implementation of AnimateDiff.

https://animatediff.github.io

Apache License 2.0

10.3k stars 843 forks source link

About training config: unet_use_cross_frame_attention #206

Open fangsiff opened 10 months ago

fangsiff commented 10 months ago

I'm going to train the motion module myself. After hard preparation and debugging work, my training can finally be executed successfully, but I can't confirm the results. The output motion module cannot work well, the result images are almost the same. It's hard to recognize moving things.

Besides, I found that when I turn on unet_use_cross_frame_attention, the module 'SparseCausalAttention2D' cannot be found. Could you please help me with trainning config and missing 'SparseCausalAttention2D' module?

tacit0428 commented 10 months ago

Hi, can you tell me your learning rate and train dataset? I also tried to train the motion module on a small dataset. I set batch size =4 and lr=3e-5, but the train result didn't look good and the train loss didn't decrease.

fangsiff commented 10 months ago

@tacit0428 Hi, I used the dataset from https://github.com/m-bain/webvid. I just use 30 videos of results_2M_train.csv, which contanins the keywords: smile, camera, look, (man | woman | girl |boy) . I also confused with the configuration, the follow is my training config:

learning_rate: 8.0e-05
num_examples: 30
max_train_steps: 8000

I also cannot get good motion module yet. Besides, my trained model cannot be used for inference v2, therefore I think the training scripts util now are suitable for mm-14 rather than mm-15 and mm-15_v2. I hope it helps.

tacit0428 commented 10 months ago

It seems good. I can use inference v2 and mm-15_v2. But I can't turn on unet_use_cross_frame_attention either. I don't know if it's essential. Did you turn on this parameter in your training? Besides, I found that the training result is not good when I use stable diffusion without any lora.

fangsiff commented 10 months ago

My result gif is lack of animate motion, the movments are barely to be noticed. I don't turn on the unet_use_cross_frame_attention. According to the paper, I guess it helps the consistency cross the frames. But SparseCausalAttention2D cannot be found. I don't konw whether lora is necessary, but I think the motion module is supposed to work well without other control components. It seems that I need pay more effort on the config or wait for more unopend details.

tacit0428 commented 10 months ago

Yes, I agree. I think that the provided motion module looks the same like your results. I tried the author's model weights, the generated animate motion is also slight. I think that the motion module try to realize good consistency. I will try to train it with different configs later.

liuchangzong commented 10 months ago

hi, have you solved this problem yet? The missing "SparseCausalAttention2D" module? I have just been reading the source code. And when I was reading the attention.py file which can be found in this link, I found that "SparseCausalAttention2D" had not been defined or been imported. It's really weird. if unet_use_cross_frame_attention: self.attn1 = SparseCausalAttention2D( query_dim=dim, heads=num_attention_heads, dim_head=attention_head_dim, dropout=dropout, bias=attention_bias, cross_attention_dim=cross_attention_dim if only_cross_attention else None, upcast_attention=upcast_attention, ) I'm not sure if it's because I'm so bad at reading code. It looks like the class "SparseCausalAttention2D" is missing which should be defined in the attention.py.

fangsiff commented 10 months ago

@liuchangzong No, not yet. The implementation of this class is Unpublished

liuchangzong commented 10 months ago

ok,thanks a lot

ssvicnent commented 10 months ago

Same problem, motion results is poor.

caojiehui commented 6 months ago

SparseCausalAttention2D is Unpublished?