First of all, thank you very much for your hard work. I am trying to build myself an activity recognition model on the subset of UCF101 dataset (I am using the top 20 activity labels).
So far, I have used a pre-trained VGG16 network to extract the features out of the individual frames extracted from the videos. The final shape I got from the VGG16 network is (20501, 7, 7, 512) (for the train set). I now want to pass these extracted features to an LSTM-based network and I am a bit confused as to how I should reshape it?
How many time steps should I pass in and also how many features in one time-step?
First of all, thank you very much for your hard work. I am trying to build myself an activity recognition model on the subset of UCF101 dataset (I am using the top 20 activity labels).
So far, I have used a pre-trained VGG16 network to extract the features out of the individual frames extracted from the videos. The final shape I got from the VGG16 network is
(20501, 7, 7, 512)
(for the train set). I now want to pass these extracted features to an LSTM-based network and I am a bit confused as to how I should reshape it?How many time steps should I pass in and also how many features in one time-step?