mzolfaghari / ECO-efficient-video-understanding

Code and models of paper " ECO: Efficient Convolutional Network for Online Video Understanding", ECCV 2018
MIT License
437 stars 96 forks source link

Some questions about the ECO Lite arch & training details #3

Closed zhang-can closed 6 years ago

zhang-can commented 6 years ago

hi, @mzolfaghari ! Thanks for your excellent work, and I'm glad you will release your code here~ But I have some questions about the ECO Lite arch & training details:

  1. The 2D-Net use BN-Inception Arch(until inception-3c layer) and the output channel is 96, does it mean that only select one branch(output channel: 96) of the inception-3c layer?

  2. For the original 3D-Resnet18, down-sampling is performed by conv3_1, conv4_1 and conv5_1. I noticed that in the supplementary material Table 1, the output size of conv3_x layer is "28x28xN", it seems that down-sampling is not performed by conv3_1, and conv3_1 use stride "1x1x1"?

  3. I try to implement the ECO Network using PyTorch, I initialize the weights of the 2D-Net with the BN-Inception arch pretrained on Kinetics, which provided by tsn-pytorch. And I initialize 3D-Net with the Kinetics-pretrained model of 3D-Resnet18 provided by 3D-ResNets-PyTorch. But when I trained on UCF101, the training loss dropped well, but the test loss performed badly, it seems like overfitting. (I noticed that in page 7 of the paper, after initializing the weights from different pretrained models, you train ECO and ECO Lite on the Kinetics dataset for 10 epochs, but I didn't do this, can this cause overfitting?)

zhang-can commented 6 years ago

I don't have the Kinetics dataset, so after initializing the weights, I trained on something-something dataset, and then finetuned the model on UCF101, but still got bad performance.

Params settings:

Training Results: image

Testing Results: image

mzolfaghari commented 6 years ago

Hi @zhang-can 1- Yes we just used one branch of inception-3c layer. 2 - Good point. We adjusted Res3a layer for consistency as following:

######## Res3a ###########
layer {
  name: "res3a_2n"
  bottom: "res2b_bn"
  top: "res3a"
  type: "Convolution"
  convolution_param {
    num_output: 128
    pad: [1, 1, 1]
    kernel_size: [3, 3, 3]
    stride: [1, 1, 1]
    weight_filler{
       type: "xavier"
    }
    bias_filler{
       type: "constant"
       value: 0
    }
  }
  param { lr_mult: 1 decay_mult: 1 } param { lr_mult: 2 decay_mult: 0 }
}

layer {
  name: "res3a_bn"
  bottom: "res3a"
  top: "res3a_bn"
  type: "BN"
  param { lr_mult: 1 decay_mult: 0 } param { lr_mult: 1 decay_mult: 0 }
  bn_param { 
    frozen: false 
    slope_filler { type: "constant" value: 1 } 
    bias_filler { type: "constant" value: 0 } 
  } 
}
layer {
  name: "res3a_relu"
  bottom: "res3a_bn"
  top: "res3a_bn"
  type: "ReLU"
}
###########################

The rest is exactly same as the original 3D-Resnet18. 3- Actually pre-training on Kinetics dataset is very important to achieve reported results. But we have also results with pre-training on Something-Something and you should be able get at least around 70% accuracy without major overfitting problem! Note that ECO (16) gives accuracy of 29.5% by training from scratch (Do not consider this as a final number because this was a preliminary test and we didn't search enough for hyper-parameters for training from scratch on Something-Something dataset) .

In your figures, training loss does not come below 1 which shows something is wrong! The root of problem could be in your setting or implementation of the model. We used dropout of 0.7 and learning rate of 0.001! In addition, we trained on Kinetics dataset. But yours is different.

Note that 3D network for 16frames case should receive input with size of : (batch_size)*96*16*28\28.

If you could send me your log or code then I'll have a look.

zhang-can commented 6 years ago

Hi @mzolfaghari , Thanks for your reply!

Today I tried to increase the dropout ratio and decrease the learning rate, but it doesn't seem to be effective. Finally I solved that over-fitting problem using clip gradient 5. Training and testing results are shown below.

After initializing the weights with (BN-Inception + 3D-Resnet18) Kinetics pretrained models, I directly train ECO with the UCF101 train_split_1, and test with UCF101 val_split_1, best test prec@1 is around 65%, the final test loss is around 1.5, is the result looks ok? What else can I do to increase the accuracy? From your advice, I think before training on UCF101, first train ECO Lite further on the Kinetics for several epochs may works well, but Kinetics dataset is hard for me to download.

Params settings:

image

zhang-can commented 6 years ago

I uploaded my code to this repo: https://github.com/zhang-can/ECO-pytorch

and here is my log file for the above result: UCF101-20180607-21-20.log

Any help is appreciated. Thank you very much!

mzolfaghari commented 6 years ago

@zhang-can I don't see where you're finetuning or initializing with pre-trained models?!

zhang-can commented 6 years ago

hi @mzolfaghari , thanks for your reply! I load the 2d and 3d pretrained models when initializing the ECO Class, see below:

https://github.com/zhang-can/ECO-pytorch/blob/master/tf_model_zoo/ECO/pytorch_load.py#L11:L12 https://github.com/zhang-can/ECO-pytorch/blob/master/tf_model_zoo/ECO/pytorch_load.py#L49:L50

Finally, I have downloaded the Kinetics dataset, and I'm going to train ECO Lite on it. So I want to know what your hyper-parameters are set up to be for training on Kinetics?

mzolfaghari commented 6 years ago

Learning rate is 0.001, step size: 4 epochs, max iteration: 10 epochs , clip gradient: 30, solver: Nesterov (for 16frames network)

I started training your network on Kinetics dataset but seems that training is from scratch!! and initialization does not work! training loss does not look fine! I will investigate the problem in depth later.

zhang-can commented 6 years ago

I tested the code on a new machine, and here is the log info with downloading the 2 pretrained models (outlined with red rectangle).

image

For your reference, I printed the "un_init_dict_keys" list in this line, and got an empty result, which means all layers are initialized with the 2 downloaded pretrained models. So maybe you can do something like this to find out the problem.

mzolfaghari commented 6 years ago

In line 156 of ECO.yaml you're connecting output of 2D network to 3D network without any change. How you make sure that 3D network is doing 3D convolutions on feature maps of a sequence not feature maps of single image! A permutation on dimensions seems necessary ([0,1,2,3,4] --> [0,2,1,3,4]). But before such permutation, you need to reshape output of 2D network to a proper format: reshape output of 2D net to N *16(or 4)*96 *28 *28 , then permutation.

And another good experiment would be to see the difference between: scratch, initializing with only 2D, initializing with only 3D and with both. Have you done such experiment? if yes what was the difference?

zhang-can commented 6 years ago

hi @mzolfaghari , I did the permutation in ECO Class's "forward" function, because pytorch doesn't have "reshape" layer.

https://github.com/zhang-can/ECO-pytorch/blob/master/tf_model_zoo/ECO/pytorch_load.py#L130-L134

The output size of 2D-Net is 4-d vector: [(batch_size*N)*96*28*28], I resize and transpose it to 5-d vector: [batch_size*96*N*28*28] before sending it to the 3D-Net.

Good ideas! I've only done the "from scratch" & "initialize with both" experiments, but not visualize the "from scratch" experiment, I will try the other two experiments and give you some feedback after that.

zhang-can commented 6 years ago

Should I do all the 4 experiments with the same hyper-parameters?

mzolfaghari commented 6 years ago

For the comparison would be better to do experiments with same hyper-parameters.

zhang-can commented 6 years ago

hi, @mzolfaghari !

Sorry for the late reply. I have only tested the ECO-Lite-4F architecture, which achieves ~85% in accuracy(but 87.4% in your paper). I wonder if it's because my Kinetics pretrained model is too weak(prec@1: 37.7%).

eco-lite-4f-prec-1

BTW, if you need, it's my pleasure to help you develop the PyTorch version of ECO.

mzolfaghari commented 6 years ago

Hi @zhang-can,

The model does not pretrained very well on the kinetics dataset. With ECO-Lite and 4 frames you must be able to get more than 52% accuracy.

Best,

zhang-can commented 6 years ago

Hi @mzolfaghari , I've sent you an email.

Tord-Zhang commented 6 years ago

@zhang-can thank you so much for the pytorch implementation of ECO. Could you share the trained models of ECO?

zhangtou commented 1 year ago

非常感谢 ECO 的 PyTorch 实施。你能分享一下ECO的训练模型吗?

你好,请问你的问题解决了吗