I want to know the Alexnet architecture details that you have used to extract the features from video frames.
As you have mentioned in the white paper - "Alexnet fc7 features (4096-dimensional) to represent each video frame and tune the parameters in each method to have the best performance."
What are the parameter that you have tuned/changed?
I want to know the Alexnet architecture details that you have used to extract the features from video frames. As you have mentioned in the white paper - "Alexnet fc7 features (4096-dimensional) to represent each video frame and tune the parameters in each method to have the best performance."
What are the parameter that you have tuned/changed?