video classification - Githubissues

Sun-Fan commented 5 years ago

Thanks for the codes. I want to ask if the codes can be used in video classification?

kaiyuyue commented 5 years ago

Thanks. Sure, it can be used in video classification task. Some pieces of code should be revised.

Backbone:

Add temporal dim for Conv2D to change it into Conv3D.
Finetune models using ImageNet pretrained 2D weights with inflation manner for Conv3D, please refer to the paper [0].
To the CGNL block part, merge feature tensors [Batch, C, T, H, W] into [B, CTHW] for video as easily as doing 2D way for image-based task.

Data

When preparing the video data, one should pay attention to the method of extracting frames used in experiments, please extract the frames in the dense manner.
During training, the sampling strategy is kept same as the way used in non-local network, please refer to the paper [1].

Training & Testing

Re-initializing the fc layer is recommended on the large datasets.
To boost the performances, warmup strategy is a better choice.
When performing testing models, to use the fully-convolutional inference is very important, which is mentioned in the paper of [1] and used in its Github Repo.

References

595448755 commented 5 years ago

Hello,Mr Yue,can you share a test_video ？ thanks.

kaiyuyue commented 5 years ago

@595448755 Hi, what do you mean about a test_video ?

lxtGH commented 5 years ago

Hi!! What is your experiment time on ActivityNet? ResNet50/ResNet101 as Backbone @KaiyuYue

kaiyuyue commented 5 years ago

Hi!! What is your experiment time on ActivityNet? ResNet50/ResNet101 as Backbone @KaiyuYue

Hi, I have no experiments on ActivityNet. But I remember the time cost on ImageNet. It will have a half hour longer with adding 1 CGNL block than that of training naive ResNet-152 on ImageNet.

lxtGH commented 5 years ago

Hi! Thanks for reply, what is the time cost on Mini-Kinetics(what is your hardware)? Also, what is training time on ImageNet??

kaiyuyue commented 5 years ago

Training ResNet-152 on ImageNet will cost about 2.5 days totally using 8 Nvidia Tesla V-100 (16G memory). Training ResNet-152+1CGNL block will roughly have 30 ~ 60 min longer than that.

lxtGH commented 5 years ago

Hi! Again thanks for your reply, what is the time cost of other backbone. Like resnet50, resnet101 or more lighter network like mobilenet?

kaiyuyue commented 5 years ago

I forget the training time using ResNet-50 on ImageNet. And I have no experiments using ResNet-101, MobileNet and ShuffleNet et al as backbone on ImageNet. Sorry about these.

595448755 commented 5 years ago

@KaiyuYue
I mean that there's no Web-cam demo script to run a test_video.mp4 . How to run a *.mp4 video.

kaiyuyue commented 5 years ago

@595448755

I mean that there's no Web-cam demo script to run a test_video.mp4 . How to run a *.mp4 video.

Sorry for the delay response. I have no specific scripts to run the inference demo in the wild. You should write the code by yourself. It's easy, just like using the model trained on ImageNet to classify the objects in the wild. Keep the dataloader flow for the inference same, and make the inference part of code be portable to output recognition terms.

595448755 commented 5 years ago

@KaiyuYue Thanks for sharing.

kaiyuyue / cgnl-network.pytorch

video classification #4