Enquiries regarding Data Preprocessing

Thanks for make this interesting project open-source. I am trying to replicate the work discussed in the paper. However, the training procedure for Hollywood2 and UCF11 data sets does not converge. I suspect that something is wrong with the extracted features.

I use Python interface of Caffe to extract the features from layer "inception_5b/output" of GoogLeNet. The shape of the features is (1024, 7, 7). According to other forum posts, the shape should be (7, 7, 1024). So I have swapped the axes of the features accordingly. Is that the difference between MATLAB interface and Python interface?
Among the 1024 feature maps, appropriately 35% of them only consist of zeros. Is it normal?
In the Matlab script, how do you define the name of the feature layer that you intend to use, such as "inception_5b/output"? The script simply uses scores = caffe('forward', {input_data{i}});.

Any help would be greatly appreciated :-)

kracwarlock / action-recognition-visual-attention

Enquiries regarding Data Preprocessing #8