lucidrains / TimeSformer-pytorch

Implementation of TimeSformer from Facebook AI, a pure attention-based solution for video classification
MIT License
686 stars 86 forks source link

How to deal with Long time video data which has1000 frames #3

Open dotsonliu opened 3 years ago

dotsonliu commented 3 years ago

How to deal with Long time video data which has1000 frames,img size is 224224,patch size is 1616,then there has 196000 patchs,how to deal with it? put it into transformer once???

tcapelle commented 3 years ago

No, you ahve to skip frames or use less. The paper deals with this issue skipping frames. (one every 15)

dfan commented 3 years ago

Could also try substituting in a sparse / linear self-attention mechanism