kenshohara / 3D-ResNets-PyTorch

3D ResNets for Action Recognition (CVPR 2018)
MIT License
3.89k stars 931 forks source link

Training of 2nd, 3rd Split on HMDB51 #229

Closed Deepanshu-beep closed 3 years ago

Deepanshu-beep commented 3 years ago

@kenshohara Since there are 3 splits for json file for annotations of HMDB51. While training we're supposed to mention just one JSON annotation, so how can we combine these 3 or we'll train on 1 JSON, which one was originally used to train for results mentioned in paper?

Deepanshu-beep commented 3 years ago

Sorry for raising issue, got the answer that your Results are average of all 3 splits.

Dongjiuqing commented 3 years ago

Sorry for raising issue, got the answer that your Results are average of all 3 splits.

What is your meaning? You mean train three times? and then, average the value of three results?

Deepanshu-beep commented 3 years ago

Yes, actually I misinterpreted split's meaning here. I thought that HMDB51 is divided in 3 partitions, but it's not. Rather all the partitions comprise of whole dataset and only difference is that videos marked for training/testing are different in those 3 splits.

And talking about the results, yes. According to paper kenshohara has stated that for results of HMDB51, here trained/fine-tuned the model thrice and final results of it are average of these 3 splits. You can also refer to Figure 4 of his Paper 'Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?' for more details on this topic.

Dongjiuqing commented 3 years ago

Yes, actually I misinterpreted split's meaning here. I thought that HMDB51 is divided in 3 partitions, but it's not. Rather all the partitions comprise of whole dataset and only difference is that videos marked for training/testing are different in those 3 splits.

Thank you for your reply! Now I understand the split's meaning. The result in 'val.log' is the clip accuracy. l learn a lot from your reply!