An error when doing snapshot

deartonym commented 9 years ago

I am new to Caffe and FRCN. Sorry that I have naive questions. I am training 2 class data for detection (1 for a foreground class and 1 for background). I got an error when calling snapshot at iteration 10,000. And below is the corresponding log:

I0715 11:37:58.141914 22505 solver.cpp:464] Iteration 9940, lr = 0.001 I0715 11:38:06.983641 22505 solver.cpp:189] Iteration 9960, loss = 0.148392 I0715 11:38:06.983675 22505 solver.cpp:204] Train net output #0: loss_bbox = 0.0771341 (* 1 = 0.0771341 loss) I0715 11:38:06.983683 22505 solver.cpp:204] Train net output #1: loss_cls = 0.0712577 (* 1 = 0.0712577 loss) I0715 11:38:06.983691 22505 solver.cpp:464] Iteration 9960, lr = 0.001 I0715 11:38:15.777523 22505 solver.cpp:189] Iteration 9980, loss = 0.0926433 I0715 11:38:15.777556 22505 solver.cpp:204] Train net output #0: loss_bbox = 0.0715442 (* 1 = 0.0715442 loss) I0715 11:38:15.777565 22505 solver.cpp:204] Train net output #1: loss_cls = 0.0210991 (* 1 = 0.0210991 loss) I0715 11:38:15.777572 22505 solver.cpp:464] Iteration 9980, lr = 0.001 speed: 0.441s / iter Traceback (most recent call last): File "./tools/train_net.py", line 87, in max_iters=args.max_iters) File "/home/shu/Documents/Deeplearning/fast-rcnn/tools/../lib/fast_rcnn/train.py", line 123, in train_net sw.train_model(max_iters) File "/home/shu/Documents/Deeplearning/fast-rcnn/tools/../lib/fast_rcnn/train.py", line 98, in train_model self.snapshot() File "/home/shu/Documents/Deeplearning/fast-rcnn/tools/../lib/fast_rcnn/train.py", line 62, in snapshot self.bbox_stds[:, np.newaxis]) ValueError: operands could not be broadcast together with shapes (84,1024) (8,1)

Anyone has any idea about this broadcasting error? Thank you.

deartonym commented 9 years ago

I think I figure it out in some sense: I have to change the number of class and corresponding parameters in the 'train.txtproto'.

e.g., for Caffe_net: layer 'data': param_str: "'num_classes': 2" layer 'cls_score': "num_output: 2", layer 'bbox_pred': "num_output: 8".

However, the problem is that if we change the structure of the network, it seems that we cannot use the 21-class PASCAL pre-trained network for fine-tuning anymore. Anyone has idea about this? How to make fine-tuning to work if given different number of classes?

Best,

WilsonWangTHU commented 9 years ago

@deartonym you must rename at least the layers that has different shape, e.g., the layer 'bbox_pred' and 'cls_score'. And them you could finetune a new one.

I suggest you finetune from the original caffenet, instead of the fast-rcnn caffenet model. In that case you could simply change the {layer 'data': param_str: "'num_classes': 2" layer 'cls_score': "num_output: 2", layer 'bbox_pred': "num_output: 8". } and then start the finetuning.

deartonym commented 9 years ago

@WilsonWangTHU Thanks for your reply! Everything works fine according to your suggestion.

parhartanvir commented 8 years ago

@deartonym can you suggest me what changes did you make to the network. I have tried to change the name, load from a pre-trained model as well as train from scratch. But I still have the same error while saving the network snapshot. I too am training for detecting a single class of objects, so I have 2 object classes. Thanks!

deartonym commented 8 years ago

Hi @parhartanvir ,

Please check @WilsonWangTHU 's answer. You should change the number of class, number of output of both 'class score' layer and 'bounding box prediction' layer according to your class number. These parameters are set in the .prototxt.

Further, you also need to change the class names in the python code. I don't quite remember the name of the .py file. It should be some factory.py or pascal_voc.py. Just follow the code when debugging. I'm sure you can find it.

Best,

Jonshoo commented 8 years ago

Hi @deartonym Now I am also encountering this bug, and I have changed the three layers‘ coresponding num, but I still got the error. Could I gain your further help? Thank u a lot.

btw, I am confusing about the sentence "you must rename at least the layers that has different shape" from @WilsonWangTHU. That means I should rename the layers's name?

ArturoDeza commented 7 years ago

I believe this is changed in the first lines of the files: /Faster-RCNN_TF/lib/networks/VGGnet_train.py and /Faster-RCNN_TF/lib/networks/VGGnet_test.py

rbgirshick / fast-rcnn

An error when doing snapshot #37