Closed Sushant-aggarwal closed 5 years ago
Initial accuracy depends on the model and width_multiplier you select. For fine tuning models pretrained on Kinetics dataset, you can achieve decent accuracies with 5-10 epochs. For the sake of consistency, we fine tuned the UCF models for 40 epochs and reported the best accuracy achieved.
So the spatial cropping part did you cropped the same portion from from all the frames for a single video or that too is random ???
All the trainings are made clipwise, and for every clip random cropping is applied in training. For video classification in validation, non-overlapping consecutive clips are passed to the networks and their scores are averaged at the end. For video evaluation, center cropping is applied to all the clips.
@okankop I am trying to replicate the experiment of finetuning for ucf-101, as per the methods specified , but getting unsuccessful in achieving any significant accuracy (not even on training data) even after several iterations. Could you please give any guidance? To simplify, I just used 10 classes from ucf-101, extracted 16 contiguous frames from each video, from any (feasible) random begin point for each epoch, randomly spatially cropped 112x112 from video resized to 256 for each epoch, used Adam optimiser with initial learning rate(1e-4), and mean subtarction and division by 255 from data. I have a M2000 GPU.
Hi @Sushant-aggarwal, you can simply use the example provided at the README. You just need to download the ucf dataset, and kinetics pretrained model. Then run the following command:
python main.py --root_path ~/ \ --video_path ~/datasets/jester \ --annotation_path Efficient-3DCNNs/annotation_UCF101/ucf101_01.json \ --result_path Efficient-3DCNNs/results \ --pretrain_path Efficient-3DCNNs/results/kinetics_shufflenet_0.5x_G3_RGB_16_best.pth \ --dataset ucf101 \ --n_classes 600 \ --n_finetune_classes 101 \ --ft_portion last_layer \ --model shufflenet \ --groups 3 \ --width_mult 0.5 \ --train_crop random \ --learning_rate 0.1 \ --sample_duration 16 \ --batch_size 64 \ --n_threads 16 \ --checkpoint 1 \ --n_val_samples 1 \
I believe the issue is cleared out. If there something unclear, you can reopen the issue.
how much epochs did it take to reach the decent accuracy and what were the initial accuracy