rbgirshick / py-faster-rcnn

Faster R-CNN (Python implementation) -- see https://github.com/ShaoqingRen/faster_rcnn for the official MATLAB version
Other
8.11k stars 4.11k forks source link

Training of py-faster-rcnn on my own dataset #759

Open lumy123 opened 6 years ago

lumy123 commented 6 years ago

I train my own dataset in cpu-only.I have replaced 'roi_pooling_layer' and 'smooth_L1_loss_layer' to a copy with cpu implementation and then make.I maked my VOC2007 datasetand modified stage1_fast_rcnn_train.pt,stage1_rpn_train.pt,stage2_fast_rcnn_train.pt,stage2_rpn_train.pt,faster_rcnn_test.pt,pascal_voc.py. I run the experiment script ./experiments/scripts/faster_rcnn_alt_opt.sh and everything seems to be ok at first, but then the training seems to get stuck with the following information never moving forward again,it stoped at 'solving...'no any message... This network produces output rpn_cls_loss I1225 23:48:24.740737 4022109120 net.cpp:270] This network produces output rpn_loss_bbox I1225 23:48:24.740788 4022109120 net.cpp:283] Network initialization done. I1225 23:48:24.740921 4022109120 solver.cpp:60] Solver scaffolding done. Loading pretrained model weights from /usr/local/Cellar/py-faster-rcnn/data/imagenet_models/ZF.v2.caffemodel I1225 23:48:25.139997 4022109120 net.cpp:816] Ignoring source layer pool5_spm6 I1225 23:48:25.140100 4022109120 net.cpp:816] Ignoring source layer pool5_spm6_flatten I1225 23:48:25.164057 4022109120 net.cpp:816] Ignoring source layer drop6 I1225 23:48:25.174723 4022109120 net.cpp:816] Ignoring source layer relu7 I1225 23:48:25.174849 4022109120 net.cpp:816] Ignoring source layer drop7 I1225 23:48:25.174911 4022109120 net.cpp:816] Ignoring source layer fc8 I1225 23:48:25.174945 4022109120 net.cpp:816] Ignoring source layer prob Solving... Could anyone give me some suggestions? Thanks!

pscheich commented 6 years ago

Hey lumy123, there are a lot of threats about this topic. https://github.com/rbgirshick/py-faster-rcnn/issues/600 You have to edit some files.

Greetz

pscheich commented 6 years ago

I trained with my own dataset (7 classes) this is the diff:

pscheich@el-Kisto ..dels/pascal_voc/ZF/faster_rcnn_alt_opt (git)-[master] % diff ~/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/ ~/temp/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt 
diff /home/pscheich/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/faster_rcnn_test.pt /home/pscheich/temp/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/faster_rcnn_test.pt
306c306
<     num_output: 8
---
>     num_output: 21
315c315
<     num_output: 32
---
>     num_output: 84
diff /home/pscheich/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_fast_rcnn_train.pt /home/pscheich/temp/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_fast_rcnn_train.pt
14c14
<     param_str: "'num_classes': 8"
---
>     param_str: "'num_classes': 21"
247c247
<     num_output: 8
---
>     num_output: 21
266c266
<     num_output: 32
---
>     num_output: 84
diff /home/pscheich/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_rpn_train.pt /home/pscheich/temp/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage1_rpn_train.pt
11c11
<     param_str: "'num_classes': 8"
---
>     param_str: "'num_classes': 21"
diff /home/pscheich/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage2_fast_rcnn_train.pt /home/pscheich/temp/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage2_fast_rcnn_train.pt
14c14
<     param_str: "'num_classes': 8"
---
>     param_str: "'num_classes': 21"
247c247
<     num_output: 8
---
>     num_output: 21
266c266
<     num_output: 32
---
>     num_output: 84
diff /home/pscheich/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage2_rpn_train.pt /home/pscheich/temp/py-faster-rcnn/models/pascal_voc/ZF/faster_rcnn_alt_opt/stage2_rpn_train.pt
11c11
<     param_str: "'num_classes': 8"
---
>     param_str: "'num_classes': 21"

and

diff models/pascal_voc/ZF/fast_rcnn/train.prototxt /home/pscheich/temp/py-faster-rcnn/models/pascal_voc/ZF/fast_rcnn/train.prototxt
14c14
<     param_str: "'num_classes': 8"
---
>     param_str: "'num_classes': 21"
247c247
<     num_output: 8
---
>     num_output: 21
266c266
<     num_output: 32
---
>     num_output: 84

8 -> 7 classes + background 32 -> (7+1) * 4

I dont know, if this is the correct way, but it works for me.

cobbwho commented 6 years ago

I have the same problem as you. Did you solved it? I modify solver.prototxt that I set fileld 'debug_info' to true. After 'Solving...' the shell show some new message like this: I0424 net.cpp:630] [Forward] Layer cls_score, param blob 0 data: 0.00795975 I0424 net.cpp:630] [Forward] Layer cls_score, param blob 1 data: 0 I0424 net.cpp:618] [Forward] Layer bbox_pred, top blob bbox_pred data: 0.0848037 I0424 net.cpp:630] [Forward] Layer bbox_pred, param blob 0 data: 0.000797856 I0424 net.cpp:630] [Forward] Layer bbox_pred, param blob 1 data: 0 I0424 net.cpp:618] [Forward] Layer loss_cls, top blob loss_cls data: 4.00548 It seems like somethng about 'roi_pooling_layer' and 'smooth_L1_loss_layer' If you solve it, remember to tell me. Thanks anyway.

gmt710 commented 5 years ago

@cobbwho ,hi. Maybe you can replace the file 'roi_pooling_layer.cpp' and 'smooth_L1_loss_layer.cpp' by https://github.com/neuleaf/faster-rcnn-cpu

cobbwho commented 5 years ago

@gmt710 thanks, anyway. Actually, I have solved this problem. But because time is too long, I have forgotten how to solve it myself. Indeed, since these two files do not have corresponding backpropagation algorithm code in cpu version. On my local computer, there are two files that I can confirm that I can use them.However, I am a bit busy recently. If I have time, I will verify their difference. And, thank you so much.