Open GBJim opened 8 years ago
@GBJim that is a good question ! I am also interested in the answer.
For a binary classifier I have read this tutorial based on the Inria Person dataset : README
Looking at the INRIA dataset I think you need to do some random boxes that aren't pedestrian. If I understand the code well the background box is generated automatically.
Hi @Austriker I tried assigning background class to the whole image in negative examples, but I got a floating point exception(C++) during the training process.
Let me read the materials you provided and see if I can comp up a solution
@GBJim have you found how to do it ?
if I have understood well when you train fast-rcnn you have to run selective_search before to create some background proposal. But with faster-rcnn the RPN layer does it by itself. So technically speaking we don't need to add negative examples.
I started to train on my set with rcnn_alt_opt but rpn_loss_bbox is really unstable ! it moves between 12 and 0.3. What should I do ?
Hi @Austriker I am not sure what's the solution of your unstable rpn_loss_bbox. The only thing I know is that the whole alternating training takes a long time and repeats several times(correct me if I am wrong) Maybe you need higher iterations or more training data to solve your problem.
About the negative examples issue. My situation is that half of my training data does not contain any foreground objects(technically it contains lots of _background object), which implies that Faster R-CNN ignores half of the data during training.
If there is a way to enable Faster R-CNN take those negative half of the data, it must be helpful to improve the model performance. This is my assumption. I am still working on how to enable Faster R-CNN doing that.
@GBJim I have the same issue with my dataset. Every image with no bbox is removed from the set. I was thinking of using selective_search to generate background boxes and merged the roidb to the dataset.
@Austriker That sounds like a good idea! I am not familiar with selective search tools cause I stepped into this research field directly from Faster R-CNN. Any suggestions or tutorial materials to get start selective search?
The roidb without any RoIs will be removed from training data set by filter_roidb. We can comment out this check on RoIs to allow background images.
Note that you may want to choose the anchors with higher box scores as hard negative examples for RPN. Besides, the hyperparameters may needed to be tuned on account of the growing number of negative batches (batches without any positive examples).
I got a floating point exception(C++) during the training process.
py-faster-rcnn call top.reshape
in several layers. Calling blob.reshape([..., 0, ...])
leads to a floating point exception. Any chance of your data layers generating empty batches?
@manipopopo Thank you for sharing this detailed information!
My goal is to enable Faster R-CNN learns from those background images(no bounding box) So I think I can not skip those lines in anchor_target_layer and roi_data_layer. I think skipping these lines is equivalent to directly removing background images from training data.
Instead, I need to generate(maybe randomly) some bounding boxes for background images and feed them into Faster R-CNN
@GBJim For the selective_search : http://koen.me/research/pub/uijlings-ijcv2013-draft.pdf Python version : https://github.com/AlpacaDB/selectivesearch
@manipopopo Thanks it's very interesting.
I think the solution should be to tweak the filter_roidb to avoid removing images without any bbox.
@GBJim
All anchors are still negative examples for the training in RPN as we don't remove these lines. And the RPN still processes background images and generates proposals. These proposals can be negative (__background__
) training examples for rcnn classification sub-network as long as your imdb
's method rpn_roidb
doesn't get rid of records corresponding to background images .
I need to generate(maybe randomly) some bounding boxes for background images
Do the randomly generated bounding boxes play the similar role with RPN proposals?
@manipopopo A-ha! I think I understand what you mean now. If those lines in anchor_target_layer and roi_data_layer are skipped, the negative anchors and proposals can be generated without annotating any background image. Am I correct?
@GBJim I think you just need to edit the filter_roidb function and it will do.
@Austriker Let me try :+1:
@GBJim
If those lines in anchor_target_layer and roi_data_layer are skipped, the negative anchors and proposals can be generated without annotating any background image. Am I correct?
In this case, yes.
roi_data_layer
and anchor_target_layer
are skipped since we don't have ground truth boxes. No bbox_overlaps
needs to be estimate, and no positive (or fg
) example exists.rpn_roidb
of your imdb
doesn't get rid of records corresponding to background images.@manipopopo I try to follow your instructions. But I encounterd KeyError: 'boxes' in imdb.py line 106
In my customized imdb class, an empty dictionary is returned when the process is asking for the annotation of negative examples. Should I simply insert a skipping command in imdb line 106 to avoid this error?
The flipped roidb
entry of self.roidb[i]
and self.roidb[i]
are almost the same thing, except that entry['boxes']
contains flipped self.roidb[i]['boxes']
and entry['flipped']
is True
.
Since there is no ground truth box in a background roidb[i]
, the only thing we have to do is make sure the flag entry['flipped']
is set to True
. You can do whatever you want as long as the structures of normal entries and flipped entries are consistent.
@manipopopo
I am reading those lines you suggest to skip, but I am wondering if that is appropriate.
My training data contains both positive and negative examples.
It seems like your modifications will handle all training data as negative examples
Is that correct?
Oh, I meant the lines are skipped when there is no ground truth bounding box. For example, if len(gt_boxes): do the following lines
.
Hi @manipopopo Following your instructions, I encountered an error in line 117 of anchor_target_layer.py bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
NameError: global name 'argmax_overlaps' is not defined
Because line 132 to 162 are skipped for negative examples, argmax_overlaps are not defined I am still thinking how to solve it correctly. Any idea about it?
And by the way, it seems like line 93 to 103 of minibatch.py which you suggest to skip, doesn't matter. Because the parent function _sample_rois will be called only when cfg.TRAIN.HAS_RPN set to false
@manipopopo I am stuck at line33-44 of minibatch.py It seems like get_minibatch only returns image and positive bounding boxes in the blob dictionary I tried to assign negative examples a zero array, which is np.zeors((1,5), dtype=np.float32), and I got Floating point exception
gt_boxes
) within background images, the bbox_targets
, bbox_inside_weights
and bbox_outside_weights
can be set to zero arrays directly, without calling bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])
. Only positive RPN examples and positive rcnn classification sub-network examples can have non-zero smooth L1 loss._sample_rois
in minibatch.py is used to generate training data of rcnn classification sub-network for alternating training strategy. If you are experimenting with end-to-end training strategy, you can safely ignore it.blobs['gt_boxes']
a zero array is OK. (However, I will use [0, 0, 0, 0, DUMMY_VALUE]
to let me distinguish between real ground truth boxes and the dummy box with the last element being DUMMY_VALUE
)
floating point exception can be attributed to the function call blob.reshape(..., 0, ...)
. Do the top blobs of your roi_data_layer
, anchor_target_layer
and proposal_target_layer
all have valid shapes?@manipopopo
Thank you for the explanations.
I am wondering about the DUMMY_VALUE. Isn't this value always zero?
Since the last value of blobs['gt_boxes']
represents the class and the classes from negative examples must be zeros. Is it correct?
For me setting DUMMY_VALUE
to 0
means the box is a __background__
bounding box provided from the data set. So I'll set DUMMY_VALUE
to -1
to mark the box as a dummy record, which is created to prevent the top blob from being empty. You can set DUMMY_VALUE
to whatever you like as long as you can distinguish the dummy box from ground truth object proposals.
Thank you @manipopopo ! My training process is working right now. If my testing results get improved, I will share my modifications in here :)
@manipopopo I am reading the paper you referenced earlier: Training Region-based Object Detectors with Online Hard Example Mining. If my understanding of hard example mining is correct, the idea is to intentionally sample proposals with higher losses(hard examples) during the training process, and use these selected proposals to implement backpropagation.
This is a promising approach, but how to implement it? As you mentioned earlier:
you may want to choose the anchors with higher box scores as hard negative examples for RPN
But I don't have the loss value for each anchor (or proposals). The final loss function is calculated in Caffe SoftmaxWithLoss
layer SmoothL1Loss
layer.
I guess I need to customize a new loss layers to integrate with RPN layer in order to implement Online Hard Example Mining of the pape. Is this correct?
I am going to research another material R-FCN you referenced for more detail :)
Really appreciate all the information you provide!
Hi all: After the verifying the results of Negative-Enabled training, I sadly found out the precision is lower than just ignoring all negative examples.
As @manipopopo suggested, I modified the minibatch.py to balance the negative and positive examples in the training process. Continuous negative examples may lead the SGD descend to the wrong direction. I am doing new experiments to verify this.
When iter_size=2
and the update of parameters are evaluated on one normal image, which contains at least one ground truth object bounding box, plus one background image, you could increase cfg.TRAIN.RPN_FG_FRACTION
(cfg.TRAIN.FG_FRACTION
) and lower cfg.RPN_TRAIN.BATCH_SIZE
(__C.TRAIN.BATCH_SIZE
) to balance the amounts of foreground and background training examples manually.
However, tuning hyperparameters might be a time-consuming process. If your GPU memory is large enough, you can try concatenating foreground images with background images, and trains the network on the concatenation results with larger cfg.TRAIN.MAX_SIZE
and cfg.TRAIN.SCALES
.
Besides, you can try doing stage 3 and stage 4 in alternating training strategy for some epochs after the end-to-end training.
@GBJim what is your result now?
@xiaoxiongli Unfortunately, the precision is damaged by Negative-Enabled training. In detail, my training process does not balance the ratio of positive and negative input images.
I guess that a 10 : 1
ratio of Positive : Negative
can be a good start to see if Negative-Enabled training can really improve the overall performance.
Currently, I am working on another project. If you want to experiment it, just follow @manipopopo's instruction in this thread :)
This issue is very useful, @GBJim , Thank you !
A question in general that I have is: am I supposed to just not include the images that don't have the class in training if I am doing pedestrian detection (for example?). I would assume that this isn't a good idea. Moreover, what should the .xml file of the non-pedestrian (or class/target present) image look like?
<?xml version="1.0" encoding="utf-8"?>
<annotation>
<filename>
<item>./SCORCH_Stimuli/Set1_XML/1.png</item>
</filename>
<folder>scorch</folder>
<object>
<bndbox>
<xmin>144</xmin>
<ymin>547</ymin>
<xmax>169</xmax>
<ymax>585</ymax>
</bndbox>
<name>background</name>
</object>
</annotation>
It also seems like putting a bounding box will limit what is the negative class search space (one bounding box, per negative image, vs 100 in the image). I see there has been some discussion on this already in the thread, but I am wondering how the .xml files play a role in the negative classes.
@ArturoDeza You need to modify the code to accept the negative images. In my case, if an image does not have corresponding annotation(a negative image), it's bounding boxes data will be assigned to some certain value(ex: None). Then, you still to modify the following process to handle these exceptional bboxes.
This thread has the detail of how to modify the codes.
@GBJim I have the same issue like you had. I want to train the network with i.a. images that does not contain any annotation. I read the thread but do not understand every aspect.
Could you summarize the steps to change the code step-by-step? It would be very helpful to all the people who have similar issues with their own dataset. Thanks in advance.
I am currently training Faster R-CNN on pedestrian data and try to build a pedestrian detector(one class plus background). Since my data set has some images without any pedestrian in it. How do I annotate them? Should I assign the whole image as a background? (One bounding box covering the whole image)