mzolfaghari / ECO-efficient-video-understanding

Code and models of paper " ECO: Efficient Convolutional Network for Online Video Understanding", ECCV 2018
MIT License
437 stars 96 forks source link

about the weight sharing of architecture overview of ECO or ECOLite #6

Closed Ai-is-light closed 6 years ago

Ai-is-light commented 6 years ago

@mzolfaghari Thanks for your excellent idea, paper and repo. And, after reading your paper, I'm a little bit confused about the weight sharing in your architecture overview. Would you mind telling me more details about it. I thought the N frames are related to the N number of Inception_3c of 2D-Net, so what's the meaning about the weight sharing. Thanks. Looking forward to any replies.

mzolfaghari commented 6 years ago

Hi @Ai-is-light,

Actually, we have only one instance of Inception_3c of 2D Net and each single frame pass through this network. In the paper with weight sharing, we meant that all N frames use the same 2D Net. If you check the model definition, you'll find that we stacked N frames and reshaped the input to have these N frames in the form of a batch. For example, If we have 4 frames and batch size is 32, then input to the 2D Net will have a batch size of 128.

mzolfaghari commented 6 years ago

Please feel free to re-open the issue if you had further questions.