train_test_split on ucf 101 dataset

jfzhang95 / pytorch-video-recognition

PyTorch implemented C3D, R3D, R2Plus1D models for video activity recognition.

MIT License

1.18k stars 250 forks source link

train_test_split on ucf 101 dataset #16

Closed kjunhwa closed 5 years ago

kjunhwa commented 5 years ago

Hi. Thank you for your uploading code.

I have a question in the dataset.py code.

As I know, ucf101 dataset have train/test list file, but in that code it divided by train_test_split by random.

So, it may cause the overlap problem in train / test dataset.

How do you think about this problem?

wave-transmitter commented 5 years ago

Hi, this issue is similar to #14. As noted in UCF101 official website:

It is very important to keep the videos belonging to the same group seperate in training and testing. Since the videos in a group are obtained from single long video, sharing videos from same group in training and testing sets would give high performance.

I think this is the reason why the random split approach achieved such high performance.

jfzhang95 commented 5 years ago

Yes, I just split dataset using sklearn package, leading to a very high performance. You may download official train/test lists and rewrite a dataloader to load official train and test sets.

kjunhwa commented 5 years ago

@wave-transmitter @jfzhang95

Thank you so much.