jeonsworld / ViT-pytorch

Pytorch reimplementation of the Vision Transformer (An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale)
MIT License
1.95k stars 374 forks source link

how to deal with video,which has 5min length #16

Open dotsonliu opened 3 years ago

Alexbeast-CN commented 2 years ago

Maybe you're looking for ViViT https://github.com/rishikksh20/ViViT-pytorch?