hx173149 / C3D-tensorflow

C3D is a modified version of BVLC tensorflow to support 3D ConvNets.
MIT License
588 stars 262 forks source link

UCF101 Training from Scratch #11

Closed ardasnck closed 7 years ago

ardasnck commented 7 years ago

Thank you very much for your contribution on C3D.

Is it possible to provide some information about UCF101 Training from scratch instead of finetuning? It would be very helpful to provide a graph or at least some numerical data that shows the test accuracy/loss on each epoch so that we can compare our on-going training.

Thanks.

hx173149 commented 7 years ago

Hi @ardasnck I am a little busy in recent days, I think I can do the evaluation in next week.

ardasnck commented 7 years ago

@hx173149 sure! i can't reproduce the same results with the paper on my own tensorflow implementation. So if you can get similar results after your evaluation, it would be great to add your train-from-scratch implementation in this repository.

hx173149 commented 7 years ago

@ardasnck if you want to get the same acc with the paper you must do fine tuning from sports-1M, the paper has said it. Actually you can reference this issue #2, and I have tried that if I don't do the fine tuning I just get the 33% acc. Cheers

ardasnck commented 7 years ago

@hx173149 yeah i know issue #2 and also read the C3D official documentation and paper about fine-tuning. But my question is exactly on training from scratch(not fine-tuning). Actually i got 40% accuracy when I train from the scratch and you mentioned that you only reached to 33%. This https://docs.google.com/document/d/1-QqZ3JHd76JfimY4QKqOojcEaf5g3JS0lNh-FHTxLag states that they reached 45% so I was wondering what could be the potential reason for the difference? Also another observation that loss value in tensorflow is clearly higher than caffe implementation during training...

hx173149 commented 7 years ago

Hi @ardasnck I think I have some free time in next days,I will reproduce my result once more... and have you ever try the caffe version code? Did it can get the 45% accuracy with training from scratch? I am curious about this problem too... PS: I can't open the URL page you mentioned upside. Cheers

ardasnck commented 7 years ago

Hi @hx173149. I updated the link once again but I'm not sure what's happening with that... For the training from scratch: Yes I run the caffe version of the code on my machine and I got 42.88% accuracy (note that I used batch size 16 because of my gpu capacity). I also edited my own tensorflow implementation (some minor changes) and I got 42.64%. I believe this shows that it works as it should be. PS: In case of the link doesn't work again , I was referring to C3D-User Guide document which author provides it on his project page.

hx173149 commented 7 years ago

Hi @ardasnck There are 13318 videos in UCF101 dataset, I used 11318 videos for traning and 2000 videos for test, and I can get a 50% top 1 accuracy after 8000 iterations with batch_size is 64. This is my traning from scratch top-1 accuracy curve, cross entropy curve, total loss(cross entropy + regularized loss) curve: image image image

ardasnck commented 7 years ago

Dear @hx173149 , Thank you very much for the very detailed feedback. This is great that you reach to 50% top 1 accuracy. Did you use the same train and test split that original caffe implementation used? Because paper claims that they got 45% accuracy and when I run their code on my own machine (batch size 16) i got 42.9% accuracy.

gy2256 commented 7 years ago

Hello,

I also want to train from scratch but I am kind of new to Deep Learning, especially using 3d convNet. Could you briefly explain the training mechanism? Based on my understanding, you feed in 16 frames as input and a label to perform supervised learning. But do you use all the frames for training? I would really appreciate your help if you can briefly explain the whole data preparation and training process.

(I am trying to rewrite everything in Keras. So far I have defined the nets but I do not know how to prepare the video data)

hx173149 commented 7 years ago

Hello @gyang1011 My training mechanism is like this: First I will choose 64 samples randomly for each iteration Then I will slice a 3.2 seconds(about 16 frames) randomly from each sample for training.

LongLong-Jing commented 7 years ago

@ardasnck @hx173149 @gyang1011 I trained this network and got 33% in split 1 of UCF101. However, I think the accuracy of this 8-layer convolution network should be 33%. In paper C3D, the author use a 5-layer convolution network (not 8-layer convolution), so they can get 45% in UCF101. This means that the structure of the network training from scratch and pre-trained in Sport 1M is different!

hx173149 commented 7 years ago

@LongLong-Jing I think you are right, maybe there have some duplication samples among my train list and test list, I am not very sure.