hbilen / dynamic-image-nets

Dynamic Image Networks for Action Recognition
181 stars 66 forks source link

Temporal Max Pooling #22

Open vinitmuchhala opened 5 years ago

vinitmuchhala commented 5 years ago

Love the work, I am just having difficulty understanding the architecture for the SI + DI model. From what I see in the architecture of the resnext.mat model, the model uses a temporal max pooling layer just before the softmax layer. It says the input to the temporal max pooling layer are the merged conv7 features and Video2. I am assuming the merged conv7 features come from running the dynamic image through the ResNext model. Where does the Video2 come from? Are we supposed to pass the whole video or just a single frame from the video clip?

nianniana commented 3 years ago

Hello,I am also interested in the work. would you please share me the resnext.mat, because the reference that the author provided to download the mat is invalid. Thank you very much! I am looking forward to your reply.