jfzhang95 / pytorch-video-recognition

PyTorch implemented C3D, R3D, R2Plus1D models for video activity recognition.
MIT License
1.16k stars 250 forks source link

Evaluation on official split 01 of UCF101 #14

Open wave-transmitter opened 5 years ago

wave-transmitter commented 5 years ago

Hello,

as mentioned in the description, sklearn is used to split train/val/test data for each dataset. Has anybody tried to train and evaluate C3D model on the split 01 of UCF101? I gave it a try, where validation set is the same to test set and got the following results:

results

Is the official split such different than the sklearn split to justify the much lower performance accuracy?

amm040341 commented 5 years ago

I have tried UCF101 official spilt. (only training set and testing set). The test accuracy is only 45%, and I am trying to figure out what going on. Or anyone can give some suggestions? Thanks a lot!

amm040341 commented 5 years ago

I think it is caused by the size of dataset because UCF101 is relatively small than other dataset, ex: sport1M, kinetics. So it will get a higher accuracy if train on other larger dataset and fintune on UCF101. reference from: https://github.com/kenshohara/3D-ResNets-PyTorch/issues/85

wave-transmitter commented 5 years ago

Hello @amm040341, to be honest I can't see the connection between fine-tuning the model and low performance. It doesn't sound right to me.

In my opinion, as mentioned in #16, the random split that @jfzhang95 has applied, leads to having videos of the same group in training and testing, where as mentioned in UCF101 website:

The videos from the same group may share some common features, such as similar background, similar viewpoint, etc.

I think this is the reason why such high accuracy was achieved in results presented in README. Using the provided official splits leads to lower performance, as you can see in the results I posted, since videos of the same group are not mixed over training and testing and the task becomes more challenging.

Regarding the low accuracy that you referenced in your experiments it's impossible to know why this is happening. You have to provide much more information regarding your setup in order to be able for other users to help.

amm040341 commented 5 years ago

Hi, First of all, sorry for not giving enough information before asking help. I used ResNet18 to train on UCF101 official split 1. The reference(last comment) is also a action recognition source code. It mentioned by the auther that "When I train ResNet-18 on UCF-101 from scratch, I got around 80% and 40% accuracies on the training and validation sets, respectively. Because UCF-101 is a relatively small dataset to train 3D CNNs from scratch, it causes overfitting. I think the training on UCF-Sports is similar."''

Now, I use a bigger dataset to train and fintune on UCF101 indeed imporoved my accuracy.

leftthomas commented 5 years ago

@amm040341 I also encountered the same problem, when I train the C3D model with official train/test split1, the testing accuracy is so low.

wave-transmitter commented 5 years ago

@amm040341 Do you use the pretrained model provided by @jfzhang95 ? If not, I suppose it's not so straightforward to get similar results by training from scratch. Αs reported here the model was pretrained in Kinetics and then fine-tuned in UCF101 dataset.

@all In case you use the pretrained C3D model to fine tune it on official split1 of UCF101, the reason why you get lower accuracy than that reported in README, is probably because of the dataset random split, as explained in a previous post.

amm040341 commented 5 years ago

@wave-transmitter
hi~I didn't use pretrained model because I want to compare my own model which did't use pretrained model, too. I ask the question since I want to know the accuracy I trained ResNet18 is normal or not.

JinyangGuo commented 5 years ago

Hi, I try to finetune the model on UCF101 split 1 by using the pretrained model in README. I split the video to non-overlapped 16-frame clips and resize the all video frames to 128x171 and random crop the training clips with size 3x16x112x112. For testing, I center crop the frame with size 3x16x112x112. The best clip-level test accuracy is 78.2%, which is lower than the C3D paper (82.3%).

Learning rate, weight decay are following the train.py. Anyone know how to improve the test accuracy?

BestJuly commented 5 years ago

@JinyangGuo I think in C3D paper, the results are not directly from the results using C3D model, they use SVM to classify and get 82.3% accuracy. I think in C3D paper, Figure 2 shows the acc for testing is lower than even 45%. To improve the results, fine-tuning is a way, or maybe using other network architecture, or similar to the paper C3D using SVM... Also, using larger input may be a way.

BestJuly commented 5 years ago

@ALL @amm040341 I think it is normal that the test accuracy for official split 01 is lower than 50% if trained from scratch using similar setting. I tried reimplement C3D, R(2+1)D, I3D, S3D models and find that the accuracy is very low. Also, by using existing codes from github also get similar results.

skyqwe123 commented 3 years ago

@ALL @amm040341 I think it is normal that the test accuracy for official split 01 is lower than 50% if trained from scratch using similar setting. I tried reimplement C3D, R(2+1)D, I3D, S3D models and find that the accuracy is very low. Also, by using existing codes from github also get similar results.

@BestJuly Have you tried train R2Plus1D from scratch on ucf101? I meet the problem the accuracy is very low only about 0.0001. Do you have any good suggestions?

BestJuly commented 3 years ago

Hi, @skyqwe123 Sure, I have. But the accuracy you reported is strange because for ucf101, the accuracy should be around 1% for random guess. Therefore, I think it's better to check codes to make sure there is no problem with the data loading, pre-processing, and post-processing part. However, if you only change the model while keeping other settings (such as data) with the same as training C3D/Resnet-18-3D, I have no idea which you have met.

skyqwe123 commented 3 years ago

@BestJuly I use this code, only changed the data path. May I have a look at your code?

hhc1997 commented 2 years ago

Hi ,@BestJuly, I train R2Plus1D from scratch on ucf101(I only use 51 labels). The test accuracy for official split 01 is only 40%. The train acc is almost 100%. Is this right?

BestJuly commented 2 years ago

Hi, @hhc1997 Yes, I think this is right. You may find better results (usually more than 80%) in some papers but they did not trained from scratch. In the scratch training mode, the low performance is normal because it is easy to overfit the training set in the UCF101 dataset.

I reported some results in this paper for scratch training mode using different 3D convolution network structures. Hope it can help you with your experiments.

hhc1997 commented 2 years ago

Hi , @BestJuly ,thanks for your reply. Have you tried training it on the pre-trained model, such as pre-training one the Kinetics. In the paper R(2+1)D, it uses a pretrained model to get 96.8 accuracy on the UCF101. I think the gap is very huge, 45%(no pretrain and 96.8%(pretrain on Kinetics).

BestJuly commented 2 years ago

@hhc1997 I have tried some networks such as 3D-ResNet-18 and R(2+1)D-18 pretrained on Kinetics-400. These models are from torchvision and usually the performance can be better than 90% easily.

In the paper R(2+1)D and some other papers, some settings are different such as input size, model depth, etc. Therefore, I think it is normal to achieve 96.8% if carefully tuning some parameters.

hhc1997 commented 2 years ago

@BestJuly HI. I pre-trained R(2+1)D on Kinetics400. And the model get only 70% top-1 accuracy (clip level) on UCF101 official split1. Do you have any advise? Thanks

fsiar commented 2 years ago

Hi, could anyone report validation accuracy for R3d from scratch . does validation 95 and accuracy 45 make sense?

BestJuly commented 2 years ago

@hhc1997 I think the 70% is not normal. If you just use R(2+1)D from torchvision, I remebered the accucacy can be at least over 80. You might need to check the data & pre-processing part such that same data normalization (mean, std) should be used.

BestJuly commented 2 years ago

@fsiar It is normal. I have reported several scratch training results are reported in this paper and this paper.

Usually R3D/R(2+1)D/I3D/S3D can not get better performance than 60% for scratch training settings. However, if you look at the validation, the performance is high because the scene/appearance are similar if you randomly split validation data from official training split. Overfitting is very severe in such cases.

fsiar commented 2 years ago

@fsiar It is normal. I have reported several scratch training results are reported in this paper and this paper.

Usually R3D/R(2+1)D/I3D/S3D can not get better performance than 60% for scratch training settings. However, if you look at the validation, the performance is high because the scene/appearance are similar if you randomly split validation data from official training split. Overfitting is very severe in such cases.

@BestJuly Thank you very much, I thought I am doing wrong, while couldn't find the reason. I also checked several things and run several codes. Please let me know if you have any advice for me on "How to have a standard experiment" on UCF and HMDB sets.