Closed bityangke closed 6 years ago
Thanks for the issue. Let me have a look.
Actually, the model architecture is extracted from the released tensorflow protobuf file in Dec. 2016. We retrained the model weights on ImageNet and Kinetics using Caffe.
@yjxiong Hi Yuanjun, Could you please tell me what mean value you use for your model(rgb and flow)? Thanks!
Hi @bityangke
I think you finding is correct. It's possible that during extraction, I switched the order of the separable filter sizes. But this will not affect the performance as long as we keep to the structure in the Caffe proto. The model achieved 94.15% single crop top-5 accuracy on ILSVRC12. You can modify the original InceptionV3's code to adapt to this change if you would like to use the weights in TF.
The mean values are the same as of other released TSN models.
[104 117 123] for RGB
128 for Flow
Thanks very much!
One more question. What input image size you use to crop the 299 x 299 image patch? I think it was 341 x 452, but I am not sure about this. Thanks in advance!
Yes, you are right. For testing 10-crops, the images are first scaled to width 452 and height 341.
Do you have any results of these two models finetuing on UCF101? Thanks.
@bityangke We have updated the website with finetuning performance on UCF101.
You can find it on http://yjxiong.me/others/kinetics_action/#transfer
Thanks very much! The performance is amazing! How did you schedule the learning rate for both nets?
@bityangke maybe just keep the learning rate setting same with original TSN training(pretrained by ImageNet).I am launching the finetuning only changing the weights file.
Thanks for sharing your experiences. @Tonyfy I will try it!
For UCF101, I simply used the 0.001 initial learning rate, decayed by 10 times every 10 epochs, for a total of 30 epochs. All BN layers are fixed. No other change is needed.
Hi~ @yjxiong I check that your reported pretrain performance on UCF101 as below:
Model | Pretraining | RGB | Flow | RGB+Flow
-- | -- | -- | -- | --
BNInception | ImageNet only | 85.4% | 89.4% | 94.9%
BNInception | ImageNet + Kinetics | 91.1% | 95.2% | 97.0%
how to fine-tuning with ImageNet+ Kinetics
? First use imageNet pretrained model to fine-tuning on Kinetics,then use the final model to fine-tuning on UCF101?
I use the Kinetics pretrained model you released to fine-tuning on UCF101 split1 with rgb modality ,but only obtain 34% accuray. lots of thanks~
@Tonyfy
They are achieved by fine tuning the released models on UCF101. Please check your settings.
@yjxiong
I am fine-tuning the tsn_bn_inception_rgb
model on ucf101_split1 with kinetics pretrained model bn_inception_kinetics_rgb_pretrained.caffemodel
, here are some settings modified:
1, in tsn_bn_inception_rgb_train_val.prototxt
, change the bn_param frozen
from "false" to "true" in conv1
(other BN layers have set frozen
to true
already).
2, in tsn_bn_inception_rgb_solver.prototxt
, change stepsize
from 1500 to 5000(10epoch),change max_iter
from 3500 to 16000(30epochs and 1000 more iters)
other setting keep remained, thus useing 4 GPUs, train_batch_size=32, test_interval: 500(a training epoch, 500 32 4 3(seg_num)~=9537(videos)25(snippet)),gamma=0.1 the fine-tuning result as below:
I0930 12:20:45.154777 14451 solver.cpp:240] Iteration 15960, loss = 0.786447
I0930 12:20:45.154932 14451 solver.cpp:255] Train net output #0: loss = 0.757018 (* 1 = 0.757018 loss)
I0930 12:20:45.154943 14451 solver.cpp:640] Iteration 15960, lr = 1e-06
I0930 12:20:55.122572 14451 solver.cpp:240] Iteration 15980, loss = 0.824895
I0930 12:20:55.122633 14451 solver.cpp:255] Train net output #0: loss = 0.842794 (* 1 = 0.842794 loss)
I0930 12:20:55.122649 14451 solver.cpp:640] Iteration 15980, lr = 1e-06
I0930 12:21:04.659654 14451 solver.cpp:511] Snapshotting to models/ucf101_split1_tsn_kinetics_rgb_bn_inception/_iter_16000.caffemodel
I0930 12:21:04.782093 14451 solver.cpp:519] Snapshotting solver state to models/ucf101_split1_tsn_kinetics_rgb_bn_inception/_iter_16000.solverstate
I0930 12:21:05.033846 14451 solver.cpp:415] Iteration 16000, loss = 0.483241
I0930 12:21:05.033885 14451 solver.cpp:433] Iteration 16000, Testing net (#0)
I0930 12:21:23.307076 14451 solver.cpp:490] Test net output #0: accuracy = 0.345526
I0930 12:21:23.307193 14451 solver.cpp:490] Test net output #1: loss = 3.61031 (* 1 = 3.61031 loss)
I0930 12:21:23.307201 14451 solver.cpp:420] Optimization Done.
I0930 12:21:23.307205 14451 caffe.cpp:203] Optimization Done.
@Tonyfy
I don't know how you calculated the iterations, but 10 epochs should be around 700 iterations given the 128 batchsize and UCF101 training set size of 9000+. So the max epoch number should be around 2200. The learning rate will not be decayed to 1e-6
in this setting.
Also, the training loss looks way too high in your log, even higher than the case of ImageNet pretraining. Make sure you are loading the correct pretrained weights using --weights
.
@yjxiong, Thanks for your reply, i will try your settings and do careful check.
and Happy National Day & Moon Festival.
@yjxiong hello ,you said, For UCF101, you simply used the 0.001 initial learning rate, decayed by 10 times every 10 epochs, for a total of 30 epochs. I have two question. First ,I know 0.001 is for spatial net ,how about the temporal net? because I find the lr is 0.001 is for RGB and 0.005 is for FLOW in the the original TSN solver. Second, I find the batch size is 128(32x4x1) in the the original TSN solver, and RGB solver is decayed by 10 times every 20 epochs(1500 iter), for a total of 45 epochs(3500 iter).FLOW solver is decayed by 10 times for [10000, 16000], for a total of 18000 iter. This is different from decayed by 10 times every 10 epochs, for a total of 30 epochs. Happy National Day & Moon Festival.
@whwu95
As I said the flow model uses the same setting.
This difference in learning strategy comes naturally when observing the much smaller domain gap in using Kinetics for pretraining (video recognition to video recognition v.s. image object recognition to video recognition).
The is more evident for the flow model. For the ImageNet only case, the flow model is initialized by cross modality pretraining and takes a lot of iteration to adapt to flow input while requiring higher learning rates. However, this process is already done in pretraining of the Kinetics pretrained models and not
needed later on
I have tuned the InceptionV3 RGB Net on UCF101 split 1,ending in 93.18% accuracy(25 frms 10crops). But flow net's accuracy is only about 91%. I save the flow image as .jpeg file with the video original resolution. I resize them to 341x453 when used. Does it has any problem? I think it might matter when the motion vector is very small.
When the motion vector is very small, it might be influenced by this loss from resizing.
Resizing is OK from our experiences. We also use on-the-fly resizing for Inception V3 models.
@yjxiong @bityangke In orginal TSN, mean value is
[104 117 123] for RGB
not [104 117 128]
@zhujiagang Noted with thanks.
Hi @yjxiong @bityangke Thank you for publishing the wonderful model. However, because I use Keras, I can not use it. (and I could not convert well) So, Could you release the converted model for Keras?
Thank you.
Hi Yuanjun, we have checked the caffe model and prototxt file (Inception V3) you released a few days ago. I found that all the decomposed convolution filters(7x7 -> 1x7, 7x1) dimensions are contrary to the inception v3 model I have used (both caffe and tf)e.g. your kernel size are 1x7, 7x1 while they have 7x1, 1x7, yours are 7x1, 1x7 while they have 1x7,7x1. It makes me very confused, because I want to convert your Inception V3 model weights to tensorflow and keras model weights, to let more guys to enjoy the perfect work. I have "converted" the weights to keras(tensorflow backend), but the video test results are not correct.