Open junwenxiong opened 4 months ago
I'm curious why encoder_hidden_state is used in the motion module? The motion module as expressed in the paper is a vanilla temporal attention, not cross-attention. https://github.com/guoqincode/Open-AnimateAnyone/blob/f3e014e0c985cd06e1955169cb381aa61482a968/models/unet_3d_blocks.py#L391-L392
I'm curious why encoder_hidden_state is used in the motion module? The motion module as expressed in the paper is a vanilla temporal attention, not cross-attention. https://github.com/guoqincode/Open-AnimateAnyone/blob/f3e014e0c985cd06e1955169cb381aa61482a968/models/unet_3d_blocks.py#L391-L392