how to use py-faster-rcnn to do 4 classes training instead of 20 classes?

kaishijeng commented 8 years ago

I am able to use py-faster-rcnn to do training with voc-2007 dataset. However, I just need 4 classes, car, person, dog and cat instead of 20 classes in the default setting. What changes do I need to do for the above 4 classes training with voc-2007 data? Will 4 classes model reduce computation, size of the trained model and improve accuracy?

Thanks, Kaishi,

PierreHao commented 8 years ago

change all classes 21 to 4 and bbox_pred 84 to 16

martinkersner commented 8 years ago

Actually if he has 4 classes (car, person, dog and cat), he needs to change num_output of cls_score layer from 21 to 5 (4 classes + 1 background) and num_output of bbox_pred layer from 84 to 20 = (4 classes + 1 background) * 4.

I have also changed classes which are loaded in lib/datasets/pascal_voc.py at line 28. In matlab file VOCcode/VOCinit.m at line 81 I changed cell VOCopts.classes.

When I was utilizing py-faster-rcnn for two classes (cat and dog) I bumped into a problem that training freezed ater 20 iterations and didn't continue even when I let training to run for a few hours. I found out that it was caused because some images contained only one label which was denoted as difficult (1). After I changed those labels to 0, everything worked fine.

PierreHao commented 8 years ago

Yes, change 21 to 5, and bbox_pred to 20, I didn't use pascal_voc.py, i write my code to read images and annotations on my datasets. We should pay attention to ground truth value and labels, some mistake will call it core dump. Thanks for rgb and faster-rcnn, it's a great work.

kaishijeng commented 8 years ago

Martin,

Do you mind sharing your 2-class changes?

Thanks, Kaishi

martinkersner commented 8 years ago

I think it is pretty straightforward since now you know what to change. Just follow instructions which I wrote in previous post.

kaishijeng commented 8 years ago

Martin

I am not sure how to change pascal_voc.py. Is the following change for 5 classes including background correct?

    self._classes = ('__background__', # always index 0
                    'car', 'cat', 'dog', 'person')

Thanks, Kaishi

martinkersner commented 8 years ago

Yes, it should be correct.

kaishijeng commented 8 years ago

Martin

I am just curious how pascal_voc.py handles image like airplane listed in data/VOCdevkit2007/VOC2007/ImageSets/Main/trainval.txt.

Do I need to modify rdata/VOCdevkit2007/VOC2007/ImageSets/Main/trainval.txt to remove images which are not in these 4 classes or pascal_voc.py will skip those images (or assign them to background class) automatically?

Kaishi

martinkersner commented 8 years ago

I am not sure, but I removed them.

duygusar commented 8 years ago

It should be reading from the lists in imagesets/class_train or _val txt files and fetch them, so I don't think it is necessary to delete them.

tsaiJN commented 8 years ago

Has anyone success in training with py-faster-rcnn ? I built the py-faster-rcnn/caffe-fast-rcnn branch of caffe but make pytest will raise errors. And while parsing the prototxt file, it seems that it doesn't recognize python layer parameter: "param_str". As long As I switch back the the latest branch of caffe, it start to recognize "param_str" but there isn't smooth_l1_loss layer in the latest caffe branch :(

has any of you encounter this problem before?

duygusar commented 8 years ago

@tsaiJN I figured out the runtest prototxt is not modified in the caffe fork of faster rcnn, so it is I think only a matter of wrong template to run the test - you can skip that and use your build, that is what I have done, the demo and training is working. If you really want to fix the runtest you can try this https://github.com/ShaoqingRen/SPP_net/issues/50#issuecomment-145735078 - although it did not work for me, people say it did for them.

tsaiJN commented 8 years ago

@duygusar you are right, I now can run train_faster_rcnn_alt_opt.py successfully. However the behavior of the training process seems to be weird. I start training by using "VGG16_faster_rcnn_final.caffemodel" as initial weight and the loss goes very small early on and remains small till the end of RPN training. (see the training log below)

The output proposal of stage1 RPN training (2000 proposal) seems to be always very small in the second dimension ([xmin, "ymin", xmax, ymax]). My training image is rather large so the other dimensions are quite large but "ymin" always give small value no matter which training image.

Any Idea what could go wrong ? p.s. I was actually training an object detector of one specific object so I modified the class number to be 2 (object of interest vs. background) I thought maybe fine-tuning from faster_rcnn_final_model could give me a good object proposal.

// --------------------------------------------------------------------------------------------------------------------- // the average prediction of bbox for 4095 images, 2000 proposal each // --------------------------------------------------------------------------------------------------------------------- array([ 1.17727367e+03, 7.62899939e-01, 2.22147910e+03, 7.13566148e+02])

// --------------------------------------------------------------------------------------------------------------------- // the loss seems strange, very small // --------------------------------------------------------------------------------------------------------------------- 1041 I1202 11:38:31.357440 108635 solver.cpp:242] Iteration 0, loss = 0.712915 1042 I1202 11:38:31.357503 108635 solver.cpp:258] Train net output #0: rpn_cls_loss = 0.702328 (* 1 = 0.702328 loss) 1043 I1202 11:38:31.357514 108635 solver.cpp:258] Train net output #1: rpn_loss_bbox = 0.0105861 (* 1 = 0.0105861 loss) 1044 I1202 11:38:31.357523 108635 solver.cpp:571] Iteration 0, lr = 0.001 1045 I1202 11:38:50.634515 108635 solver.cpp:242] Iteration 20, loss = 0.149095 1046 I1202 11:38:50.634577 108635 solver.cpp:258] Train net output #0: rpn_cls_loss = 0.131242 (* 1 = 0.131242 loss) 1047 I1202 11:38:50.634588 108635 solver.cpp:258] Train net output #1: rpn_loss_bbox = 0.0178527 (* 1 = 0.0178527 loss) 1048 I1202 11:38:50.634598 108635 solver.cpp:571] Iteration 20, lr = 0.001 1049 I1202 11:39:08.557626 108635 solver.cpp:242] Iteration 40, loss = 0.0822484 1050 I1202 11:39:08.557664 108635 solver.cpp:258] Train net output #0: rpn_cls_loss = 0.075185 (* 1 = 0.075185 loss) 1051 I1202 11:39:08.557674 108635 solver.cpp:258] Train net output #1: rpn_loss_bbox = 0.00706339 (* 1 = 0.00706339 loss) 1052 I1202 11:39:08.557682 108635 solver.cpp:571] Iteration 40, lr = 0.001 1053 I1202 11:39:30.433254 108635 solver.cpp:242] Iteration 60, loss = 0.128716 1054 I1202 11:39:30.433310 108635 solver.cpp:258] Train net output #0: rpn_cls_loss = 0.0821368 (* 1 = 0.0821368 loss) 1055 I1202 11:39:30.433328 108635 solver.cpp:258] Train net output #1: rpn_loss_bbox = 0.0465792 (* 1 = 0.0465792 loss) 1056 I1202 11:39:30.433338 108635 solver.cpp:571] Iteration 60, lr = 0.001 1057 I1202 11:39:51.051041 108635 solver.cpp:242] Iteration 80, loss = 0.715471 1058 I1202 11:39:51.051089 108635 solver.cpp:258] Train net output #0: rpn_cls_loss = 0.237801 (* 1 = 0.237801 loss) 1059 I1202 11:39:51.051100 108635 solver.cpp:258] Train net output #1: rpn_loss_bbox = 0.47767 (* 1 = 0.47767 loss) 1060 I1202 11:39:51.051108 108635 solver.cpp:571] Iteration 80, lr = 0.001 1061 I1202 11:40:11.246857 108635 solver.cpp:242] Iteration 100, loss = 0.0347638 1062 I1202 11:40:11.246898 108635 solver.cpp:258] Train net output #0: rpn_cls_loss = 0.0285856 (* 1 = 0.0285856 loss) 1063 I1202 11:40:11.246908 108635 solver.cpp:258] Train net output #1: rpn_loss_bbox = 0.00617825 (* 1 = 0.00617825 loss) 1064 I1202 11:40:11.246917 108635 solver.cpp:571] Iteration 100, lr = 0.001 1065 I1202 11:40:29.282299 108635 solver.cpp:242] Iteration 120, loss = 0.0496501 1066 I1202 11:40:29.282351 108635 solver.cpp:258] Train net output #0: rpn_cls_loss = 0.0458244 (* 1 = 0.0458244 loss) 1067 I1202 11:40:29.282361 108635 solver.cpp:258] Train net output #1: rpn_loss_bbox = 0.00382573 (* 1 = 0.00382573 loss) 1068 I1202 11:40:29.282368 108635 solver.cpp:571] Iteration 120, lr = 0.001 1069 I1202 11:40:46.898138 108635 solver.cpp:242] Iteration 140, loss = 0.0386 1070 I1202 11:40:46.898181 108635 solver.cpp:258] Train net output #0: rpn_cls_loss = 0.0338518 (* 1 = 0.0338518 loss)

duygusar commented 8 years ago

@tsaiJN I am actually working on the same problem, a specific object detector (1 class + bg ) I have been experimenting on their data though. On the contrary I thought initializing with their model would not work for me, I am not sure but since I have a smaller dataset of one specific object I decided that using their pretrained model would not work for me. Interesting though the loss converges so quickly. I haven't attempted at training on my dataset yet, still looking into how to train ZF from scratch :/

I am not sure, have you also changed bbox_pred in prototxt files? It should also be arranged according to the object class number, to 8 (2class*4bb) I guess.

tsaiJN commented 8 years ago

@duygusar Haha, great to hear someone working on the same problem :) yes, I have modified bbox_pred in prototxt files to 8 output. Did your cls_loss for stage1 rpn training start up small (around 0.6~0.7) and converge quickly to around 0.00X too? Well, I have tried training from scratch and from faster_rcnn_pretrained and they both start from a loss around 0.6 and while faster_rcnn_pretrained end up in cls_loss around 0.00X, training from scratch end up with 0.0X. But by looking at their proposals, I don't think they've learned anything :p

duygusar commented 8 years ago

@tsaiJN I will try this and let you know how it goes if you haven't resolved it already. I bumped into problems with my dataset even though I convert it to VOC format. Also tried to use my old caffe model but it failed to transfer weights :( I need to learn about net surgery between different network architectures or train a new ZF. What are your specs and time for VGG by the way? I am using ZF only because I have computational capacity 2.0 Tesla

kshalini commented 8 years ago

@duygusar, @tsaiJN

did you manage to get your training done? I would like to train on my data and mostly I would be finetuning the Caffenet.

one particular question is, if I have 50 categories and 100 images in each category, should i generate bounding box proposals for all 5000 (50 x 100) images using selective search and use that as a input for the training? Am asking this specifically because, as the # of images and categories grow, I am worried about the size of the .mat file (with bounding boxes proposals) that gets generated.

can you shed some light on this aspect pls?

tsaiJN commented 8 years ago

@duygusar I just modified and ran "experiment/scripts/train_faster_rcnn_alt_opt.sh" using still faster_rcnn branch caffe. Magically, it would start training without any error. I still can't figure out why running that script can start training while running train_faster_rcnn_alt_opt.py cannot :\

as for transferring weights, I think if you are using later versions of caffe, the model update should be done by caffe automatically

I finetune from the final model used by demo.py using VGG16 model as defined in models/VGG16/faster_rcnn/* ( with changing num_class to 2 and bbox_pred to 8)

Still can't manage to solve the not-training problem :(

@kshalini I am not sure if I get what you mean, but if you mean that should we use selective search to train fast rcnn: I don't think we need selective search in training faster_rcnn. rpn layer should be responsible for the bbox proposal job instead. the four step training (alternative optimizing in the paper) is first train rpn, then train fast rcnn by proposals generated by rpn, then fine-tune rpn, finally fine-tune fast rcnn. No selective search involves.

duygusar commented 8 years ago

@kshalini yes the faster rcnn doesn't use selective search, the whole point of the paper is not using them, you can find more detail in the paper.

duygusar commented 8 years ago

@tsaiJN Yea I think my caffe model is old, it has been a while. I can retrain one using the new version but don't I need the architecture to be the same? Or don't I need to rename layers or something? - It might be the initial model using a different input size as well.

I am using the matlab version sorry, and I am currently running with their pretrained model as initializer, it actually started around 0.4 and then kept dropping to 0.3 0.2 gradually then jumps to 0.6 - the whole pattern is like this gradually going down but with a lot of spikes. First stage rcnn started around 0.1 though, very small. I highly doubt this will work but just letting it run until I have my initial model

kshalini commented 8 years ago

thanks for the inputs @tsaiJN and @duygusar

i was going by the instructions for the fast-rCNN, where we have to modify factory.py, and a few more and also run selective search.

so just to clarify again, to run this training here, a) we don't need to modify any python files. b) just editing the configs in faster_rcnn_end2end.sh will do?

does this mean it learns to propose regions and also trains the Caffe Net from the same dataset? am asking this explicitly because, I had the understanding that learning to propose regions could happen on a different dataset (say VOC2007) and the Caffe Net training (or finetuning) could happen on a different dataset (our own custom data ?)

wangdelp commented 8 years ago

@duygusar @tsaiJN @kaishijeng I encountered the following error when trying to train the detection model on a subset of the PASCAL voc classes.

I have:

modified the pascal_roc.py (https://github.com/wangdelp/py-faster-rcnn/blob/train_IKEA_cls/lib/datasets/pascal_voc.py#L29) to only includes the 6 classes that I want.
modified the num_classes to 7 (6 classes + 1 background, https://github.com/wangdelp/py-faster-rcnn/blob/train_IKEA_cls/models/VGG16/faster_rcnn_end2end/train.prototxt#L11)
modified the bbox num_output to 28(4*7, https://github.com/wangdelp/py-faster-rcnn/blob/train_IKEA_cls/models/VGG16/faster_rcnn_end2end/train.prototxt#L643).

Any idea what more do I need to modify? Thank you.

duygusar commented 8 years ago

@wangdelp I actually have used the Matlab version, well I think that part is similar to what I have done in VOCinit.m Instead of ('background', 'bottle', 'chair', 'diningtable', 'pottedplant', 'sofa', 'tvmonitor') I have only ('bottle', 'chair', 'diningtable', 'pottedplant', 'sofa', 'tvmonitor') without the background class (of course I related to your dataset) I am not sure how it goes with the Python implementation though. I remember a more detailed thread here so maybe you can find it in other open or closed issues.

ericromanenghi commented 8 years ago

Remove "background" it's not a good idea. You need this class. Note that the de classes object works like a map between class index and class names.

duygusar commented 8 years ago

@eternautaCAT I have only suggested it for Matlab version in VOCinit.m you can see here that is how it is used originally in VOC http://vision.cs.utexas.edu/voc/VOCcode/VOCinit.m I don't know how it works in Python version, thanks for clarifying that for @wangdelp .

ericromanenghi commented 8 years ago

No problem!

Endeavour116 commented 8 years ago

Hi，Every one. I downloaded the py-faster-rcnn yesterday and used it to train my own dataset. It works well for my own dataset and I would like to share the modification of this kit. If you intrested in this and send a mail to my email and I will give you a note file about the modification. and you can follow my suggestions to train your own dataset. 2006endeavour@163.com

andrewliao11 commented 8 years ago

Hi, @wangdelp I don't if you've solved the problem, if you still struggle with that you might want to follow this tutorial https://github.com/andrewliao11/py-faster-rcnn/blob/master/README.md

wangdelp commented 8 years ago

@andrewliao11 thank you, that help a lot

wangdelp commented 8 years ago

@andrewliao11 Hi Andrew, I am checking out your code to train the imagenet detector, but sounds like it requires matlab which I do not have license. Do we really need matlab to run the training? Thank you.

andrewliao11 commented 8 years ago

matlab is not necessary. the .mat file is from imagenet dataset, which denotes what categories are in your dataset

wangdelp commented 8 years ago

@andrewliao11 I am running your code and encounter some error, due to that some of the images has no bbox information, for example ILSVRC2012_val_00000109.xml. How do you deal with this? Thank you. Is the val1.txt, val2.txt downloaded from https://github.com/rbgirshick/rcnn/blob/master/data/splits/ilsvrc13/val1.txt.tgz? Thank you.

abhisheksgumadi commented 8 years ago

Hi, just a quick question here. How should the bounding boxes be mentioned for the background class? Should we have explicit images for background class and also specify ground truth bounding boxes for them?

medhani commented 7 years ago

i just need to train the network for a class. how can i label the data?

Anhaoxu commented 7 years ago

@kaishijeng How can you get the subset class data from voc? Thanks for sharing!

rbgirshick / py-faster-rcnn

how to use py-faster-rcnn to do 4 classes training instead of 20 classes? #1