Closed VSunN closed 6 years ago
Hi @VSunN , the model is not fine-tuned. And we use the classification layer(without softmax)as feature.
Thank you for your reply. Does it mean that for the P3D network, when the input interval is 16 frames, the output feature length is 400 instead of 200? @wzmsltw
Yes
Hi, Mentioned in your ActivityNet report improvement B that you adopt TSN and P3D pretrained on Kinetics-400 dataset for video feature extraction. I wonder whether the model fine-tuned on the ActivityNet dataset. If not, are you using the feature of the last pooling layer?