kracwarlock / action-recognition-visual-attention

Action recognition using soft attention based deep recurrent neural networks
http://www.cs.toronto.edu/~shikhar/projects/action-recognition-attention
351 stars 158 forks source link

Can you elaborate how to do data preprocessing? #6

Closed YantianZha closed 8 years ago

YantianZha commented 8 years ago

Hi,

I think I need to prepare four preprocessed files (https://github.com/kracwarlock/action-recognition-visual-attention/tree/master/util). That said, I'm confused at how to get "train_features.h5".

Could you please share your related code that can do this? I would appreciated more if you can share all of the codes that do those preprocessing jobs.

Thank again!

GerardoHH commented 8 years ago

Hi, I have the same problem, any suggestions ? Thank you!

frajem commented 8 years ago

Has anyone done the pre-processing in Python?

kracwarlock commented 8 years ago

@YantianZha @frajem @GerardoHH Hi. I am sorry for all the delay. I was very busy with my thesis and graduation. I am no longer at the University of Toronto but will try to reply regularly here.

  1. I extracted the features from each single frame after reducing them to 256x256
  2. @YknZhu (Yukun Zhu in my paper) extracted the features for me using a matlab script. I am sharing that here http://www.cs.toronto.edu/~yukun/extracting_feature.m. To combine the individual files generated by this script that he sent me I used https://gist.github.com/kracwarlock/96499936487d6125dd010319669c6648
nixingyang commented 8 years ago

Hi, I am using the h5py package to save the deep features extracted from the GoogLeNet. The file size of my training data set is about 39 GB while yours is about 8.6 GB. Could you please point out the possible reasons behind such discrepancy? BR.

YantianZha commented 8 years ago

If I were you, I would just return back to use the matlab code. BTW: I'm also using h5py, and have been facing the same issue.

nixingyang commented 8 years ago

@YantianZha I think I have found the root cause for this discrepancy. In case you use h5py, please check this website. You may set the parameter compression to gzip, the file size will drop dramatically. Matlab should have applied some compression algorithms by default.

GerardoHH commented 8 years ago

@kracwarlock

Hi, thank you for the matlab scripts (I've to modify them a little bit ), I finally generate the h5 files for UCF11. Now, when I run the script: "THEANO_FLAGS='floatX=float32,device=gpu0,mode=FAST_RUN,nvcc.fastmath=True' python -m scripts.evaluate_ucf11 "

I got the error:

src/actrec.py:744: FutureWarning: comparison to None will result in an elementwise object comparison in the future. if x == None: Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/local/caffe/action-recognition-visual-attention-master/scripts/evaluate_ucf11.py", line 87, in main(0, options) File "/usr/local/caffe/action-recognition-visual-attention-master/scripts/evaluate_ucf11.py", line 45, in main fps=params['fps'][0] File "src/actrec.py", line 749, in train cost = f_grad_shared(x, mask, y) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 864, in call storage_map=getattr(self.fn, 'storage_map', None)) File "/usr/local/lib/python2.7/dist-packages/theano/gof/link.py", line 314, in raise_with_op reraise(exc_type, exc_value, exc_trace) File "/usr/local/lib/python2.7/dist-packages/theano/compile/function_module.py", line 852, in call outputs = self.fn() File "/usr/local/lib/python2.7/dist-packages/theano/gof/op.py", line 865, in rval r = p(n, [x[0] for x in i], o) File "/usr/local/lib/python2.7/dist-packages/theano/tensor/subtensor.py", line 2160, in perform out[0] = inputs[0].getitem(inputs[1:]) IndexError: index 11 is out of bounds for axis 1 with size 11 Apply node that caused the error: AdvancedSubtensor(HostFromGpu.0, ARange{dtype='int64'}.0, Reshape{1}.0) Toposort index: 404 Inputs types: [TensorType(float32, matrix), TensorType(int64, vector), TensorType(int64, vector)] Inputs shapes: [(3840, 11), (3840,), (3840,)] Inputs strides: [(44, 4), (8,), (8,)] Inputs values: ['not shown', 'not shown', 'not shown'] Outputs clients: [[GpuFromHost(AdvancedSubtensor.0)]]

Backtrace when the node is created: File "src/actrec.py", line 415, in build_model cost = -tensor.log(probs[tensor.arange(n_timesteps*n_samples), tmp] + 1e-8)

I think the error is related to the features stored in the h5 file.

Any sugestions ?
Thank you for yor help!

kracwarlock commented 8 years ago

@GerardoHH This error is because for some variable x the code is trying to access some "variable"[11] but "variable" is of size 11 (x[0] to x[10]). Most probably you have n_actions or some hyperparameter set incorrectly (smaller than its actual size in your dataset) in you evaluate_ucf11.py file.

GerardoHH commented 8 years ago

@kracwarlock

Hi, thank you for your reply, It was my fault at generating the labels file. Now the script is running, I suppose it will take 1 day to finish. I'll let you know my results.

Thank you for all your help. And by the way, I'd like to read your thesis, I'm doing my Ph.D and I need state of art documentation XD.

kracwarlock commented 8 years ago

@GerardoHH That's good news and you are welcome :)

Also, my thesis is available at http://www.cs.toronto.edu/~shikhar/publications/msc-thesis.pdf. It is not state of the art but it is what it is. Attention models still have a long way to go.

Litchiware commented 8 years ago

@kracwarlock @GerardoHH Hi, I downloaded the matlab script and tried to generate feature data with it, but met with some problems.

How to get the model_def_file and model_file? The default value for model_def_file is '/u/yukun/Projects/RCNN/caffe/examples/GoogleLeNet/forward_googlenet_outputconv.prototxt' and the other is '/u/yukun/Projects/RCNN/caffe/examples/GoogleLeNet/imagenet_googlenet.caffemodel', but I can't find these two files in my caffe directory.

So I try to use $caffe_root/models/bvlc_googlenet/deploy.prototxt and bvlc_googlenet.caffemodel, but when I run this matlab script, I get this error input data/diff size does not match target blob shape, input data/diff size: [ 224 224 3 128 ] vs target blob shape: [ 224 224 3 10 ].

It seems that I used the wrong model files, but where to get the proper ones? Any help would be greatly appreciated.

GerardoHH commented 8 years ago

@Litchiware

I faced the same problem, just match the shape of the blobs to [224 224 3 10] .

GerardoHH commented 8 years ago

Hi @kracwarlock

I finished the training and testing of UCF-11 using 30% - 30% - 30% in testing, training and validation. My results are Accuracy: Train 1.0 Valid 0.930124223602 Test 0.957115009747

I think its OK. XD Thank you for your help.

kracwarlock commented 8 years ago

@Litchiware I think Yukun used the Princeton model (http://vision.princeton.edu/pvt/GoogLeNet/ImageNet/)[http://vision.princeton.edu/pvt/GoogLeNet/ImageNet/]. Try with @GerardoHH 's change.

@GerardoHH That's great :)

kaix90 commented 8 years ago

Hi @GerardoHH Can you share all the scripts need to generate the h5py file. Thanks.

GerardoHH commented 8 years ago

Hi @calmevtime

Here are the original scripts from @kracwarlock

https://github.com/kracwarlock/action-recognition-visual-attention/issues/6

Find the answare on Apr 17 , and GL XD

ae86208 commented 8 years ago

Hi @GerardoHH If I am using kracwarlock's matlab code to extract features, how should I modify princeton's train_val_googlenet.prototxt to extract the convolutional features, like forward_googlenet_outputconv.prototxt? I have tried to erase everything from cls1_pool layer and the output scores dim is 1 1 1000 10, it's absolutely wrong. Thanks.

GerardoHH commented 8 years ago

hi @ae86208

You shuld not modifiy the .proto file, only instance the net in a TEST mode, and find the last convolutional layer of shape( 7,7,1024), it should be the one before the first FC layer, those are the features used for the LSTM.

ae86208 commented 8 years ago

Thanks a lot @GerardoHH .

thanhnguyentang commented 7 years ago

Hi @GerardoHH, how much time the code spends in one batch in your training? I get about 60s per batch ( I use 128 samples per batch) which is too slow I think. I'd like to know how much time other dudes get to see if I can optimize the code.

Thank you :)

zhujiagang commented 7 years ago

Hi @GerardoHH @ae86208 I don't quite understand how to instance the net into a TEST mode, or where to add or modify something, since in .prototxt files, data layer in phase: TEST, is batch_size: 32, crop_size: 224, mean_file: "imagenet_mean.binaryproto", while in extracting_feature.m file, it seems to have a batch_size: 4 because of imseq = cell(1,size(vidFrames,4)), and mean_file: ilsvrc_2012_mean.mat. I also feel confused about if we don't delete the original output items in prototxt, will it still match its last conv layer to the code in extractingfeature.m FeatDim = 7 7 _ 1024? Sorry I'm a caffe newbie.

CCV-Edward commented 7 years ago

@kracwarlock Thanks for your fantastic work, When i download extracting_feature.m, i was confused with bID. And, i had no idea to do with this code, could you please give some detail information. Best regards :)

YinRui1991 commented 7 years ago

Hi, @GerardoHH Can you share the code which is used to combine features to h5 format? Because the link (To combine the individual files generated by this script that he sent me I used https://gist.github.com/kracwarlock/96499936487d6125dd010319669c6648) is not available. Thanks!