lucidrains / video-diffusion-pytorch

Implementation of Video Diffusion Models, Jonathan Ho's new paper extending DDPMs to Video Generation - in Pytorch
MIT License
1.25k stars 129 forks source link

Duplicate dividing in relative positional encoding #11

Open songweige opened 2 years ago

songweige commented 2 years ago

Hey @lucidrains, thanks for keeping these models implemented. In line 88 https://github.com/lucidrains/video-diffusion-pytorch/blob/f55f1b0824b1be7d2bb555ed7a5d612eff8ad5d0/video_diffusion_pytorch/video_diffusion_pytorch.py#L84-L88 you have max_exact as the half of num_buckets, whose value was already halved in line 84.

I think that is duplicated and should be changed to identity:

 max_exact = num_buckets
oxjohanndiep commented 2 years ago

I suggest you read the paper "On Scalar Embedding of Relative Positions in Attention Models". In that paper, they explain the implemented bucketing function.