Closed pinkfloyd06 closed 6 years ago
Hi @pinkfloyd06 , This convolutional two stream fusion is only useful when you are leveraging the spatio-temporal features of the video. There is fusion of spatial and temporal features before fully connected layers, so you won't be able to extract features at the frame level.
Hello @tomar840 ,
Let me first thank you for your work.
Can l extract features representation (from last fully connected layer) for each frame (rather than for the whole video ?
Thank you a lot