kenshohara / 3D-ResNets-PyTorch

3D ResNets for Action Recognition (CVPR 2018)
MIT License
3.84k stars 930 forks source link

How to calculate UCF101's top1 video level accuracy #270

Closed YTHmamba closed 1 year ago

YTHmamba commented 1 year ago

hello,I calculated UCF101's top1 video level accuracy by running the following command

Calculate top-5 class probabilities of each video using a trained model (~/data/results/save_200.pth.)
Note that inference_batch_size should be small because actual batch size is calculated by inference_batch_size * (n_video_frames / inference_stride).

python main.py --root_path ~/data --video_path kinetics_videos/jpg --annotation_path kinetics.json \
--result_path results --dataset kinetics --resume_path results/save_200.pth \
--model_depth 50 --n_classes 700 --n_threads 4 --no_train --no_val --inference --output_topk 5 --inference_batch_size 1
Evaluate top-1 video accuracy of a recognition result (~/data/results/val.json).

python -m util_scripts.eval_accuracy ~/data/kinetics.json ~/data/results/val.json 

Then I get the load_ground_truth function in utils_scripts/eval_accuracy.py which returns ground_truth 0. I think the following part of the code needs to be changed. Can someone tell me how to change it?

for video_id, v in data['database'].items():
        if subset != v['subset']:
            continue
        this_label = v['annotations']['label']
        ground_truth.append((video_id, class_labels_map[this_label]))