Open 121649982 opened 4 years ago
Hi, I have not trained MobileNetV2 on UCF by myself. I would suggest you fine-tune from the Kinetics pre-trained weights since smaller models are generally more difficult to train.
Hi, I have not trained MobileNetV2 on UCF by myself. I would suggest you fine-tune from the Kinetics pre-trained weights since smaller models are generally more difficult to train.
hi,I think you reply is very useful,but I can not find the model that provided for pretrained(for mobilenetv2),isn't it?
@bravewhh The code will automatically download the model :
I type following command :
python main.py jester RGB \
--arch mobilenetv2 --num_segments 8 \
--gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 \
--batch-size 8 -j 8 --dropout 0.8 --consensus_type=avg --eval-freq=1 \
--shift --shift_div=8 --shift_place=blockres \
--tune_from=online_demo/mobilenetv2_jester_online.pth.tar
trained 15 epochs got Then i manage to run this model in online_demo/main_windows.py. Because the model is raw, i write a function to modify it :
# used for renaming self-trained model
def rename_state_dict(pth_path):
pth = torch.load(pth_path)
state_dict = pth['state_dict']
new_state_dict = dict()
for k, v in state_dict.items():
if k.startswith('module.base_model.'):
new_state_dict[k.replace('module.base_model.', '').replace('.net', '')] = v
elif k.startswith('module.new_fc'):
new_state_dict[k.replace('module.new_fc', 'classifier').replace('.net', '')] = v
for k, v in new_state_dict.items():
print(k)
return new_state_dict
and change torch_module.load_state_dict(torch.load(model_path))
to torch_module.load_state_dict(rename_state_dict(model_path))
.
Finnaly i run the demo and open camera, but i found the result is always no gesture no matter what gesture i made, no gesture's score is always very high :
avg_logit is [[ 22.340822 6.5533843 50.539978 -6.065401 -0.741372 -20.378466
-6.2159 -20.626362 -20.334791 7.939289 5.943706 -23.043537
-22.903278 4.4982567 9.428827 0.0870559 -26.094301 -22.054436
-1.2793571 -1.4129375 4.517989 -3.4708936 -0.2565832 14.422895
16.626217 15.432453 16.409939 ]]
279 frame, recognition result is No gesture
avg_logit is [[ 22.17796 6.2511935 49.412605 -5.6404104 -0.52879286
-19.771511 -5.904038 -20.105488 -19.830935 7.5743732
5.7956963 -22.469093 -22.24118 4.3991537 9.207253
0.12220014 -25.271044 -21.428812 -1.1572013 -1.3778756
4.4399977 -3.5914447 -0.50066507 13.804983 15.938972
14.783237 15.780186 ]]
281 frame, recognition result is No gesture
avg_logit is [[ 22.180958 6.1000786 48.909077 -5.4299574 -0.4247962
-19.500866 -5.760176 -19.859396 -19.579912 7.403986
5.703137 -22.20191 -21.919268 4.3430147 9.11828
0.11797364 -24.868698 -21.108702 -1.1050811 -1.3677037
4.409947 -3.6534653 -0.6306344 13.488837 15.595351
14.445818 15.466529 ]]
283 frame, recognition result is No gesture
avg_logit is [[ 22.055128 5.826027 47.89605 -5.0477147 -0.2218489
-18.961802 -5.473299 -19.398214 -19.121082 7.0886984
5.5682364 -21.691662 -21.338387 4.2532845 8.919545
0.14981724 -24.133896 -20.533306 -0.9922131 -1.3438832
4.32492 -3.7541132 -0.83523726 12.927706 14.965378
13.861613 14.88865 ]]
285 frame, recognition result is No gesture
avg_logit is [[ 21.968521 5.602677 47.0569 -4.7339234 -0.05610415
-18.5347 -5.2590575 -19.011139 -18.742971 6.8353863
5.435157 -21.298359 -20.89486 4.1619096 8.77436
0.17440723 -23.564438 -20.085413 -0.89869905 -1.3255008
4.2516346 -3.8005748 -0.9630685 12.502383 14.475084
13.391574 14.422092 ]]
287 frame, recognition result is No gesture
avg_logit is [[ 21.748917 5.5050898 45.88289 -4.499616 0.12401614
-18.094845 -5.0777416 -18.469624 -18.181099 6.670557
5.2405853 -21.099339 -20.68506 3.9974198 8.781888
0.15012178 -23.352407 -19.963812 -0.86198545 -1.3199841
4.132883 -3.6076944 -0.78165925 12.34535 14.163363
13.083709 14.054122 ]]
289 frame, recognition result is No gesture
avg_logit is [[ 21.63152 5.3609514 45.644783 -4.3256407 0.17421141
-17.86486 -4.9670362 -18.401573 -18.158514 6.553738
5.272513 -20.759995 -20.294794 4.0131555 8.547821
0.24509555 -22.870964 -19.568396 -0.7856485 -1.3132954
4.087539 -3.7881541 -1.0165278 11.99528 13.850357
12.8261795 13.800946 ]]
291 frame, recognition result is No gesture
avg_logit is [[ 21.46786 5.359134 45.42929 -4.2824726 0.16652384
-17.696587 -4.950445 -18.357807 -18.141844 6.5484796
5.286404 -20.666899 -20.198881 3.9980087 8.476922
0.31411338 -22.725739 -19.488968 -0.7481018 -1.3360809
4.068129 -3.8019323 -1.0203681 11.929721 13.777028
12.766082 13.7178 ]]
293 frame, recognition result is No gesture
avg_logit is [[ 21.34182 5.3350086 44.971577 -4.2091866 0.20372805
-17.47935 -4.890396 -18.138481 -17.923254 6.5156717
5.185838 -20.651243 -20.190723 3.8889291 8.501047
0.28822786 -22.658953 -19.489553 -0.7470191 -1.2915494
4.0240216 -3.6551523 -0.8475129 11.91179 13.651511
12.679753 13.562803 ]]
295 frame, recognition result is No gesture
avg_logit is [[ 21.460854 5.2326274 44.53007 -4.0520663 0.28569117
-17.326513 -4.831665 -17.851751 -17.625885 6.4168
5.0186796 -20.650888 -20.191128 3.7345753 8.555279
0.24853873 -22.55812 -19.406641 -0.7191258 -1.2404392
3.9580843 -3.5137684 -0.71063286 11.840729 13.461693
12.496143 13.329902 ]]
297 frame, recognition result is No gesture
avg_logit is [[ 21.543562 5.248145 44.71801 -4.100203 0.22891116
-17.387608 -4.8854446 -17.913378 -17.694712 6.4584913
5.0214977 -20.704025 -20.25106 3.7124085 8.571009
0.23945399 -22.593369 -19.43127 -0.7407799 -1.222097
3.961918 -3.495773 -0.68020856 11.870217 13.499371
12.54111 13.37663 ]]
299 frame, recognition result is No gesture
avg_logit is [[ 21.361473 5.3155193 44.3853 -4.093155 0.23724303
-17.221155 -4.8600945 -17.752514 -17.504498 6.4969296
4.9664264 -20.84703 -20.431925 3.6308913 8.699615
0.23598577 -22.709074 -19.605043 -0.7234338 -1.2285644
3.9669604 -3.3336782 -0.4685365 11.970868 13.506671
12.561577 13.333607 ]]
@tonylins could you please give me some insight and advice ? Thank you in advance ! This is my trained model pth ckpt.best.pth.tar.zip
I type following command :
python main.py jester RGB \ --arch mobilenetv2 --num_segments 8 \ --gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 \ --batch-size 8 -j 8 --dropout 0.8 --consensus_type=avg --eval-freq=1 \ --shift --shift_div=8 --shift_place=blockres \ --tune_from=online_demo/mobilenetv2_jester_online.pth.tar
trained 15 epochs got Then i manage to run this model in online_demo/main_windows.py. Because the model is raw, i write a function to modify it :
# used for renaming self-trained model def rename_state_dict(pth_path): pth = torch.load(pth_path) state_dict = pth['state_dict'] new_state_dict = dict() for k, v in state_dict.items(): if k.startswith('module.base_model.'): new_state_dict[k.replace('module.base_model.', '').replace('.net', '')] = v elif k.startswith('module.new_fc'): new_state_dict[k.replace('module.new_fc', 'classifier').replace('.net', '')] = v for k, v in new_state_dict.items(): print(k) return new_state_dict
and change
torch_module.load_state_dict(torch.load(model_path))
totorch_module.load_state_dict(rename_state_dict(model_path))
. Finnaly i run the demo and open camera, but i found the result is always no gesture no matter what gesture i made, no gesture's score is always very high :avg_logit is [[ 22.340822 6.5533843 50.539978 -6.065401 -0.741372 -20.378466 -6.2159 -20.626362 -20.334791 7.939289 5.943706 -23.043537 -22.903278 4.4982567 9.428827 0.0870559 -26.094301 -22.054436 -1.2793571 -1.4129375 4.517989 -3.4708936 -0.2565832 14.422895 16.626217 15.432453 16.409939 ]] 279 frame, recognition result is No gesture avg_logit is [[ 22.17796 6.2511935 49.412605 -5.6404104 -0.52879286 -19.771511 -5.904038 -20.105488 -19.830935 7.5743732 5.7956963 -22.469093 -22.24118 4.3991537 9.207253 0.12220014 -25.271044 -21.428812 -1.1572013 -1.3778756 4.4399977 -3.5914447 -0.50066507 13.804983 15.938972 14.783237 15.780186 ]] 281 frame, recognition result is No gesture avg_logit is [[ 22.180958 6.1000786 48.909077 -5.4299574 -0.4247962 -19.500866 -5.760176 -19.859396 -19.579912 7.403986 5.703137 -22.20191 -21.919268 4.3430147 9.11828 0.11797364 -24.868698 -21.108702 -1.1050811 -1.3677037 4.409947 -3.6534653 -0.6306344 13.488837 15.595351 14.445818 15.466529 ]] 283 frame, recognition result is No gesture avg_logit is [[ 22.055128 5.826027 47.89605 -5.0477147 -0.2218489 -18.961802 -5.473299 -19.398214 -19.121082 7.0886984 5.5682364 -21.691662 -21.338387 4.2532845 8.919545 0.14981724 -24.133896 -20.533306 -0.9922131 -1.3438832 4.32492 -3.7541132 -0.83523726 12.927706 14.965378 13.861613 14.88865 ]] 285 frame, recognition result is No gesture avg_logit is [[ 21.968521 5.602677 47.0569 -4.7339234 -0.05610415 -18.5347 -5.2590575 -19.011139 -18.742971 6.8353863 5.435157 -21.298359 -20.89486 4.1619096 8.77436 0.17440723 -23.564438 -20.085413 -0.89869905 -1.3255008 4.2516346 -3.8005748 -0.9630685 12.502383 14.475084 13.391574 14.422092 ]] 287 frame, recognition result is No gesture avg_logit is [[ 21.748917 5.5050898 45.88289 -4.499616 0.12401614 -18.094845 -5.0777416 -18.469624 -18.181099 6.670557 5.2405853 -21.099339 -20.68506 3.9974198 8.781888 0.15012178 -23.352407 -19.963812 -0.86198545 -1.3199841 4.132883 -3.6076944 -0.78165925 12.34535 14.163363 13.083709 14.054122 ]] 289 frame, recognition result is No gesture avg_logit is [[ 21.63152 5.3609514 45.644783 -4.3256407 0.17421141 -17.86486 -4.9670362 -18.401573 -18.158514 6.553738 5.272513 -20.759995 -20.294794 4.0131555 8.547821 0.24509555 -22.870964 -19.568396 -0.7856485 -1.3132954 4.087539 -3.7881541 -1.0165278 11.99528 13.850357 12.8261795 13.800946 ]] 291 frame, recognition result is No gesture avg_logit is [[ 21.46786 5.359134 45.42929 -4.2824726 0.16652384 -17.696587 -4.950445 -18.357807 -18.141844 6.5484796 5.286404 -20.666899 -20.198881 3.9980087 8.476922 0.31411338 -22.725739 -19.488968 -0.7481018 -1.3360809 4.068129 -3.8019323 -1.0203681 11.929721 13.777028 12.766082 13.7178 ]] 293 frame, recognition result is No gesture avg_logit is [[ 21.34182 5.3350086 44.971577 -4.2091866 0.20372805 -17.47935 -4.890396 -18.138481 -17.923254 6.5156717 5.185838 -20.651243 -20.190723 3.8889291 8.501047 0.28822786 -22.658953 -19.489553 -0.7470191 -1.2915494 4.0240216 -3.6551523 -0.8475129 11.91179 13.651511 12.679753 13.562803 ]] 295 frame, recognition result is No gesture avg_logit is [[ 21.460854 5.2326274 44.53007 -4.0520663 0.28569117 -17.326513 -4.831665 -17.851751 -17.625885 6.4168 5.0186796 -20.650888 -20.191128 3.7345753 8.555279 0.24853873 -22.55812 -19.406641 -0.7191258 -1.2404392 3.9580843 -3.5137684 -0.71063286 11.840729 13.461693 12.496143 13.329902 ]] 297 frame, recognition result is No gesture avg_logit is [[ 21.543562 5.248145 44.71801 -4.100203 0.22891116 -17.387608 -4.8854446 -17.913378 -17.694712 6.4584913 5.0214977 -20.704025 -20.25106 3.7124085 8.571009 0.23945399 -22.593369 -19.43127 -0.7407799 -1.222097 3.961918 -3.495773 -0.68020856 11.870217 13.499371 12.54111 13.37663 ]] 299 frame, recognition result is No gesture avg_logit is [[ 21.361473 5.3155193 44.3853 -4.093155 0.23724303 -17.221155 -4.8600945 -17.752514 -17.504498 6.4969296 4.9664264 -20.84703 -20.431925 3.6308913 8.699615 0.23598577 -22.709074 -19.605043 -0.7234338 -1.2285644 3.9669604 -3.3336782 -0.4685365 11.970868 13.506671 12.561577 13.333607 ]]
@tonylins could you please give me some insight and advice ? Thank you in advance ! This is my trained model pth ckpt.best.pth.tar.zip
Have you solved this issue? I met the same problem.
@hzz765 No, afterwards, i paused this project and went to do another paoject...
I type following command :
python main.py jester RGB \ --arch mobilenetv2 --num_segments 8 \ --gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 \ --batch-size 8 -j 8 --dropout 0.8 --consensus_type=avg --eval-freq=1 \ --shift --shift_div=8 --shift_place=blockres \ --tune_from=online_demo/mobilenetv2_jester_online.pth.tar
trained 15 epochs got Then i manage to run this model in online_demo/main_windows.py. Because the model is raw, i write a function to modify it :
# used for renaming self-trained model def rename_state_dict(pth_path): pth = torch.load(pth_path) state_dict = pth['state_dict'] new_state_dict = dict() for k, v in state_dict.items(): if k.startswith('module.base_model.'): new_state_dict[k.replace('module.base_model.', '').replace('.net', '')] = v elif k.startswith('module.new_fc'): new_state_dict[k.replace('module.new_fc', 'classifier').replace('.net', '')] = v for k, v in new_state_dict.items(): print(k) return new_state_dict
and change
torch_module.load_state_dict(torch.load(model_path))
totorch_module.load_state_dict(rename_state_dict(model_path))
. Finnaly i run the demo and open camera, but i found the result is always no gesture no matter what gesture i made, no gesture's score is always very high :avg_logit is [[ 22.340822 6.5533843 50.539978 -6.065401 -0.741372 -20.378466 -6.2159 -20.626362 -20.334791 7.939289 5.943706 -23.043537 -22.903278 4.4982567 9.428827 0.0870559 -26.094301 -22.054436 -1.2793571 -1.4129375 4.517989 -3.4708936 -0.2565832 14.422895 16.626217 15.432453 16.409939 ]] 279 frame, recognition result is No gesture avg_logit is [[ 22.17796 6.2511935 49.412605 -5.6404104 -0.52879286 -19.771511 -5.904038 -20.105488 -19.830935 7.5743732 5.7956963 -22.469093 -22.24118 4.3991537 9.207253 0.12220014 -25.271044 -21.428812 -1.1572013 -1.3778756 4.4399977 -3.5914447 -0.50066507 13.804983 15.938972 14.783237 15.780186 ]] 281 frame, recognition result is No gesture avg_logit is [[ 22.180958 6.1000786 48.909077 -5.4299574 -0.4247962 -19.500866 -5.760176 -19.859396 -19.579912 7.403986 5.703137 -22.20191 -21.919268 4.3430147 9.11828 0.11797364 -24.868698 -21.108702 -1.1050811 -1.3677037 4.409947 -3.6534653 -0.6306344 13.488837 15.595351 14.445818 15.466529 ]] 283 frame, recognition result is No gesture avg_logit is [[ 22.055128 5.826027 47.89605 -5.0477147 -0.2218489 -18.961802 -5.473299 -19.398214 -19.121082 7.0886984 5.5682364 -21.691662 -21.338387 4.2532845 8.919545 0.14981724 -24.133896 -20.533306 -0.9922131 -1.3438832 4.32492 -3.7541132 -0.83523726 12.927706 14.965378 13.861613 14.88865 ]] 285 frame, recognition result is No gesture avg_logit is [[ 21.968521 5.602677 47.0569 -4.7339234 -0.05610415 -18.5347 -5.2590575 -19.011139 -18.742971 6.8353863 5.435157 -21.298359 -20.89486 4.1619096 8.77436 0.17440723 -23.564438 -20.085413 -0.89869905 -1.3255008 4.2516346 -3.8005748 -0.9630685 12.502383 14.475084 13.391574 14.422092 ]] 287 frame, recognition result is No gesture avg_logit is [[ 21.748917 5.5050898 45.88289 -4.499616 0.12401614 -18.094845 -5.0777416 -18.469624 -18.181099 6.670557 5.2405853 -21.099339 -20.68506 3.9974198 8.781888 0.15012178 -23.352407 -19.963812 -0.86198545 -1.3199841 4.132883 -3.6076944 -0.78165925 12.34535 14.163363 13.083709 14.054122 ]] 289 frame, recognition result is No gesture avg_logit is [[ 21.63152 5.3609514 45.644783 -4.3256407 0.17421141 -17.86486 -4.9670362 -18.401573 -18.158514 6.553738 5.272513 -20.759995 -20.294794 4.0131555 8.547821 0.24509555 -22.870964 -19.568396 -0.7856485 -1.3132954 4.087539 -3.7881541 -1.0165278 11.99528 13.850357 12.8261795 13.800946 ]] 291 frame, recognition result is No gesture avg_logit is [[ 21.46786 5.359134 45.42929 -4.2824726 0.16652384 -17.696587 -4.950445 -18.357807 -18.141844 6.5484796 5.286404 -20.666899 -20.198881 3.9980087 8.476922 0.31411338 -22.725739 -19.488968 -0.7481018 -1.3360809 4.068129 -3.8019323 -1.0203681 11.929721 13.777028 12.766082 13.7178 ]] 293 frame, recognition result is No gesture avg_logit is [[ 21.34182 5.3350086 44.971577 -4.2091866 0.20372805 -17.47935 -4.890396 -18.138481 -17.923254 6.5156717 5.185838 -20.651243 -20.190723 3.8889291 8.501047 0.28822786 -22.658953 -19.489553 -0.7470191 -1.2915494 4.0240216 -3.6551523 -0.8475129 11.91179 13.651511 12.679753 13.562803 ]] 295 frame, recognition result is No gesture avg_logit is [[ 21.460854 5.2326274 44.53007 -4.0520663 0.28569117 -17.326513 -4.831665 -17.851751 -17.625885 6.4168 5.0186796 -20.650888 -20.191128 3.7345753 8.555279 0.24853873 -22.55812 -19.406641 -0.7191258 -1.2404392 3.9580843 -3.5137684 -0.71063286 11.840729 13.461693 12.496143 13.329902 ]] 297 frame, recognition result is No gesture avg_logit is [[ 21.543562 5.248145 44.71801 -4.100203 0.22891116 -17.387608 -4.8854446 -17.913378 -17.694712 6.4584913 5.0214977 -20.704025 -20.25106 3.7124085 8.571009 0.23945399 -22.593369 -19.43127 -0.7407799 -1.222097 3.961918 -3.495773 -0.68020856 11.870217 13.499371 12.54111 13.37663 ]] 299 frame, recognition result is No gesture avg_logit is [[ 21.361473 5.3155193 44.3853 -4.093155 0.23724303 -17.221155 -4.8600945 -17.752514 -17.504498 6.4969296 4.9664264 -20.84703 -20.431925 3.6308913 8.699615 0.23598577 -22.709074 -19.605043 -0.7234338 -1.2285644 3.9669604 -3.3336782 -0.4685365 11.970868 13.506671 12.561577 13.333607 ]]
@tonylins could you please give me some insight and advice ? Thank you in advance ! This is my trained model pth ckpt.best.pth.tar.zip
Have you solved this issue? I met the same problem.
Same issue here. Have you solved the issue? @hzz765 @wwdok
And did you change the bi-direction shift to uni-direction shift in training? Do you think the problem may be relevant with this ?
@tonylins could you please give us some insight and advice ? Thank you
Same issue here. Have you solved the issue?
I met the same error. Have you solved the issue?
could you share how did you run with mobilenetv2 finetune on the pre-trained online demo? i faced the issue sd = sd['state_dict']
Thank you very much for your codebase. I have trained my own data with resnet50 successfully,but I when train it with mobilenet, the accuracy is very low.
python main.py ucf101 RGB --arch mobilenetv2 --num_segments 8 --gd 20 --lr 0.001 --lr_steps 10 20 --epochs 25 --batch-size 2 -j 16 --dropout 0.8 --consensus_type=avg --eval-freq=1 --shift --shift_div=8 --shift_place=blockres
Freezing BatchNorm2D except the first one. Epoch: [24][0/104], lr: 0.00001 Time 15.333 (15.333) Data 15.214 (15.214) Loss 0.6946 (0.6946) Prec@1 50.000 (50.000) Prec@5 100.000 (100.000) Epoch: [24][20/104], lr: 0.00001 Time 0.085 (0.815) Data 0.000 (0.725) Loss 0.6946 (0.6896) Prec@1 50.000 (54.762) Prec@5 100.000 (100.000) Epoch: [24][40/104], lr: 0.00001 Time 0.084 (0.459) Data 0.000 (0.371) Loss 0.6947 (0.6907) Prec@1 50.000 (53.659) Prec@5 100.000 (100.000) Epoch: [24][60/104], lr: 0.00001 Time 0.086 (0.336) Data 0.000 (0.250) Loss 0.6946 (0.6894) Prec@1 50.000 (54.918) Prec@5 100.000 (100.000) Epoch: [24][80/104], lr: 0.00001 Time 0.082 (0.274) Data 0.000 (0.188) Loss 0.6391 (0.6893) Prec@1 100.000 (54.938) Prec@5 100.000 (100.000) Epoch: [24][100/104], lr: 0.00001 Time 0.084 (0.236) Data 0.000 (0.151) Loss 0.6946 (0.6926) Prec@1 50.000 (51.980) Prec@5 100.000 (100.000) Test: [0/12] Time 2.424 (2.424) Loss 0.7487 (0.7487) Prec@1 0.000 (0.000) Prec@5 100.000 (100.000) Testing Results: Prec@1 52.174 Prec@5 100.000 Loss 0.69226 Best Prec@1: 52.174
why?