Open dotsonliu opened 3 years ago
How to deal with Long time video data which has1000 frames,img size is 224224,patch size is 1616,then there has 196000 patchs,how to deal with it? put it into transformer once???
No, you ahve to skip frames or use less. The paper deals with this issue skipping frames. (one every 15)
Could also try substituting in a sparse / linear self-attention mechanism
How to deal with Long time video data which has1000 frames,img size is 224224,patch size is 1616,then there has 196000 patchs,how to deal with it? put it into transformer once???