Can't reproduce Diving48 training

Hello!

I want to reproduce your model training on Diving48, but failed. I used your diving48 config file, vitclip_base_diving48.py, with (1) original ver (2) clip len = 8, frame interval = 8, and command bash tools/dist_train.sh <PATH/TO/CONFIG> <NUM_GPU> --test-best --validate --cfg-options work_dir=<PATH/TO/OUTPUT>.

I wonder what is the problem. Please let me know. Thank you.

environment info python 3.9.13, pytorch 1.10.0, cuda 11.3
here is a part of log 2023-03-08 21:00:08,793 - mmaction - INFO - Epoch [50][540/627] lr: 2.960e-07, eta: 0:01:07, time: 0.698, data_time: 0.000, memory: 19879, top1_acc: 0.3833, top5_acc: 0.8187, loss_cls: 1.8822, loss: 1.8822 2023-03-08 21:00:22,859 - mmaction - INFO - Epoch [50][560/627] lr: 2.960e-07, eta: 0:00:51, time: 0.703, data_time: 0.006, memory: 19879, top1_acc: 0.3604, top5_acc: 0.8083, loss_cls: 1.9930, loss: 1.9930 2023-03-08 21:00:36,925 - mmaction - INFO - Epoch [50][580/627] lr: 2.960e-07, eta: 0:00:36, time: 0.703, data_time: 0.006, memory: 19879, top1_acc: 0.3688, top5_acc: 0.8250, loss_cls: 1.9158, loss: 1.9158 2023-03-08 21:00:50,874 - mmaction - INFO - Epoch [50][600/627] lr: 2.960e-07, eta: 0:00:20, time: 0.697, data_time: 0.000, memory: 19879, top1_acc: 0.4146, top5_acc: 0.8438, loss_cls: 1.8412, loss: 1.8412 2023-03-08 21:01:04,933 - mmaction - INFO - Epoch [50][620/627] lr: 2.960e-07, eta: 0:00:05, time: 0.703, data_time: 0.006, memory: 19879, top1_acc: 0.3896, top5_acc: 0.8125, loss_cls: 1.9206, loss: 1.9206 2023-03-08 21:01:10,258 - mmaction - INFO - Saving checkpoint at 50 epochs 2023-03-08 21:03:12,731 - mmaction - INFO - Evaluating top_k_accuracy ... 2023-03-08 21:03:12,740 - mmaction - INFO - top1_acc 0.1025 top5_acc 0.3548 2023-03-08 21:03:12,740 - mmaction - INFO - Evaluating mean_class_accuracy ... 2023-03-08 21:03:12,741 - mmaction - INFO - mean_acc 0.0586 2023-03-08 21:03:12,799 - mmaction - INFO - The previous best checkpoint /data/aim/outputs/diving48/best_top1_acc_epoch_45.pth was removed 2023-03-08 21:03:14,531 - mmaction - INFO - Now best checkpoint is saved as best_top1_acc_epoch_50.pth. 2023-03-08 21:03:14,531 - mmaction - INFO - Best top1_acc is 0.1025 at 50 epoch. 2023-03-08 21:03:14,532 - mmaction - INFO - Epoch(val) [50][985] top1_acc: 0.1025, top5_acc: 0.3548, mean_class_accuracy: 0.0586 2023-03-08 21:03:15,535 - mmaction - INFO - Warning: test_best set as True, but is not applicable (eval_hook.best_ckpt_path is None)

taoyang1122 / adapt-image-models

Can't reproduce Diving48 training #5