vt-vl-lab / video-data-aug

Learning Representational Invariances for Data-Efficient Action Recognition
Apache License 2.0
32 stars 5 forks source link

How long does the training take for semi-supervised setting? (e.g. 20%) #1

Closed lambert-x closed 3 years ago

lambert-x commented 3 years ago

Hi, Thanks for your great work. May I know the time you take for training with your configuration? I train with the semi-20% (ucf-101) setting and the process seems to be extremely slow.

Yuliang-Zou commented 3 years ago

Yes, the bottleneck part is the human detection loading and augmentation. You can disable this augmentation by the following modification. Take 20% UCF as an example:

Then the training time should be similar to supervised training on the whole UCF dataset.

lambert-x commented 3 years ago

Thank you very much. BTW, do you have a result with this setting? I get top1: 34.66 / top5: 58.95 and want to check if this result makes sense.

PeiqinZhuang commented 3 years ago

Yes, the bottleneck part is the human detection loading and augmentation. You can disable this augmentation by the following modification. Take 20% UCF as an example:

  • Remove L64 and L69.
  • Remove the suffix "WithBox" in L65-L68
  • Reduce the training epoch by half in L154

Then the training time should be similar to supervised training on the whole UCF dataset.

Hi, I wonder how long the training time is when training the whole UCF dataset, e.g., the config for r2plus1d_r34_8x8x1_180e_ucf101_rgb. This information would be useful in helping me to identify if the training process of mine is too slow.

Yuliang-Zou commented 3 years ago

@PeiqinZhuang ~2 days should be enough.