zhujiagang / DTPP

Deep networks with Temporal Pyramid Pooling. The official implementation for "End-to-end Video-level Representation Learning for Action Recognition, ICPR 2018."
BSD 2-Clause "Simplified" License
72 stars 22 forks source link

Reproducing the inference #4

Open csantosbh opened 5 years ago

csantosbh commented 5 years ago

I'm currently trying to reproduce the inference in arbitrary videos of the HMDB51 dataset by using the pretrained weights.

So far, I have:

  1. Compiled OpenCV with the contrib module and cuda support
  2. Successfully compiled caffe (caffe-tpp-net) with CUDA and python support
  3. Set up the PYTHONPATH environment variable so python2.7 may find the compiled modules
  4. Downloaded the pretrained weights with get_init_models.sh and get_kinetics_pretraining_models.sh

Since README.md doesn't mention how to perform inference, I read README_old.md and found that there is a script tools/eval_net.py for that purpose. So I ran:

python tools/eval_net.py hmdb51 1 rgb /var/datasets/hmdb51/ models/hmdb51/flow_tpp_delete_dropout_deploy.prototxt ./init_models/hmdb51_split_1_tsn_flow_reference_bn_inception.caffemodel

But, after a ton of messages, that gave me the following error:

Traceback (most recent call last): File "tools/eval_net.py", line 125, in video_scores = map(eval_video, eval_video_list) File "tools/eval_net.py", line 69, in eval_video video_frame_path = f_info[0][vid] KeyError: '20060723sfjffbumblebeesuitman_run_f_cm_np2_ri_med_1'

The path to the dataset is correct, and I have unpacked all rar files from the HMDB dataset. The README_old.md file mentions a script scripts/extract_optical_flow.sh for preprocessing the video files, but in the master branch, this script doesn't exist.

So, are there any further steps necessary for reproducing inference?

zhujiagang commented 5 years ago

Sorry for the trouble brought to you in running the code due to my unseriousness. I've update the README.md and you could refer to the eval_net_tpp_hmdb.py for more details. Our trained models on UCF101 and HMDB51 have not been released.

csantosbh commented 5 years ago

Hi @zhujiagang, thank you so much for your swift replies! I'm currently working on reproducing the training with the hmdb_scripts_split_1/train_rgb_tpp_delete_dropout_split_1.sh script on a Titan X. I had a few issues and had to modify some files to avoid them, namely:

I've been running the training for several hours now and this is the latest iteration output:

I1031 15:06:09.097961  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 877.991 > 40) by scale factor 0.0455585
I1031 15:06:21.079704  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 700.793 > 40) by scale factor 0.0570782
I1031 15:06:33.580734  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1143.5 > 40) by scale factor 0.0349803
I1031 15:06:46.480859  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1017.31 > 40) by scale factor 0.0393192
I1031 15:06:58.553812  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1014.26 > 40) by scale factor 0.0394377
I1031 15:07:10.659497  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1247.51 > 40) by scale factor 0.0320639
I1031 15:07:23.238704  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 968.414 > 40) by scale factor 0.0413046
I1031 15:07:35.426038  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1048.87 > 40) by scale factor 0.0381362
I1031 15:07:48.065246  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1141.9 > 40) by scale factor 0.0350292
I1031 15:08:00.421540  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 734.556 > 40) by scale factor 0.0544547
I1031 15:08:12.666364  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 748.326 > 40) by scale factor 0.0534526
I1031 15:08:25.080652  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1113.1 > 40) by scale factor 0.0359356
I1031 15:08:37.743530  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 878.843 > 40) by scale factor 0.0455144
I1031 15:08:49.748545  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1046.37 > 40) by scale factor 0.0382275
I1031 15:09:01.976547  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1059.37 > 40) by scale factor 0.0377582
I1031 15:09:14.561251  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1073.01 > 40) by scale factor 0.0372784
I1031 15:09:27.069676  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1176.19 > 40) by scale factor 0.0340082
I1031 15:09:39.748011  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1173.07 > 40) by scale factor 0.0340987
I1031 15:09:51.674002  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1058.21 > 40) by scale factor 0.0377996
I1031 15:10:04.277062  4431 solver.cpp:625] Gradient clipping: scaling down gradients (L2 norm 1022.46 > 40) by scale factor 0.0391213
I1031 15:10:16.263995  4431 solver.cpp:240] Iteration 4740, loss = 0.184195
I1031 15:10:16.264025  4431 solver.cpp:255]     Train net output #0: accuracy = 1
I1031 15:10:16.264046  4431 solver.cpp:255]     Train net output #1: loss = 0.0173458 (* 1 = 0.0173458 loss)
I1031 15:10:16.264051  4431 solver.cpp:640] Iteration 4740, lr = 0.001

The gradient scaling down and accuracy 1 worries me a bit, is that expected?

If I understand the paper correctly, I should also generate a temporal model by training with the script hmdb_scripts_split_1/train_flow_tpp_delete_dropout_split_1.sh, is that correct? Do I need to preprocess the input images somehow? Is that what the lib/dense-flow is there for? Is there a script for automating this process?

zhujiagang commented 5 years ago

About out of memory due to batch_size, how did you compile caffe-tpp-net? Have you installed openmpi before compiling caffe-tpp-net? Please refer to the command in TSN repo for more details of compiling caffe with openmpi. Because I've found that compiling their caffe with openmpi can save a lot of memory.

Which caffe do you use? I do not remember that I need to set new_width, new_height when using caffe-tpp-net. crop_size: 224 is enough.

I used the script in TSN repo to extract frames and extract optical flow. Please refer to the TSN repo.

I think your training process is fine with batch_size: 1. Once you could run with batch_size: 4, the training curve may change. I also want to remind you that I've decreased learning rate by hand when validation accuracy no longer increases as paper says

Instead of decreasing the learning rate according to a fixed schedule, the learning rate is lowered by a factor of 10 after validation error saturates.

A better way is to find a fixed schedule which is more appropriate for ablation study.

The base learning rate for rgb stream and flow stream you should also follow the paper.

csantosbh commented 5 years ago

Hello @zhujiagang, thanks again for your clear answer!

I probably needed to set new_width and new_height because I extracted myself the frames of the dataset using ffmpeg. When I proceeded to extract the optical flow using the TSN repo, I noticed that their script also extracted the frames themselves, so I suppose they resize the images to a minimum dimension.

Now I have another question. I have finished training both the RGB and the Flow models, and I'd like to combine these models to perform predictions like in the paper. Please correct me if I'm wrong, but the script eval_tpp_net_hmdb.py only seems to evaluate the RGB or the Flow models separately, unlike the full model described in your paper. Is there another script that combines both trained models to perform inference?

Thanks again!

zhujiagang commented 5 years ago

@csantosbh You can refer to the eval_scores_rgb_flow.py

csantosbh commented 5 years ago

I see! The eval_scores_rgb_flow.py complained about some missing .npz files, so I guess I should have changed the save_scores variable in the eval_tpp_net_hmdb.py script, right? So the whole process should be like this:

  1. Open eval_tpp_net_hmdb.py, edit both net_weights and save_scores variables accordingly to run the RGB evaluation
  2. Run eval_tpp_net_hmdb.py
  3. Open eval_tpp_net_hmdb.py, edit both net_weights and save_scores variables accordingly to run the Flow evaluation
  4. Run eval_tpp_net_hmdb.py
  5. Open eval_scores_rgb_flow.py, edit the variable score_files to the names of the .npz files generated by steps 2 and 4
  6. Run eval_scores_rgb_flow.py

Is that correct? I'm running inference again since my .npz file was overridden when I ran the flow evaluation (I only changed the net_weights variable).

zhujiagang commented 5 years ago

Yes!

csantosbh commented 5 years ago

Thank you @zhujiagang! I was able to reproduce the results of the paper with ~73% accuracy with the fusion model.

Now I'm trying to use the Kinects pre trained model to improve the accuracy as in the paper. What I did was edit the script hmdb_scripts_split_1/train_rgb_tpp_delete_dropout_split_1.sh and change the --weights flag to the file kinetics_pretraining_models/bn_inception_kinetics_rgb_pretrained/bn_inception_kinetics_rgb_pretrained.caffemodel (I would do something similar for the Flow trainer). However, caffe crashes when I try to run training like this. Is there anything else that needs to be changed?

zhujiagang commented 5 years ago

@csantosbh You should use the kinetics_hmdb_split_1/train_kinetics_rgb_tpp_p124_split_1.sh, because the authors of TSN provide kinetics pretraining model and prototxt in http://yjxiong.me/others/kinetics_action/ with different name from using imagenet pretraining.