Annotations of negative examples.

GBJim commented 8 years ago

I am currently training Faster R-CNN on pedestrian data and try to build a pedestrian detector(one class plus background). Since my data set has some images without any pedestrian in it. How do I annotate them? Should I assign the whole image as a background? (One bounding box covering the whole image)

Austriker commented 8 years ago

@GBJim that is a good question ! I am also interested in the answer.

For a binary classifier I have read this tutorial based on the Inria Person dataset : README

Looking at the INRIA dataset I think you need to do some random boxes that aren't pedestrian. If I understand the code well the background box is generated automatically.

GBJim commented 8 years ago

Hi @Austriker I tried assigning background class to the whole image in negative examples, but I got a floating point exception(C++) during the training process.

Let me read the materials you provided and see if I can comp up a solution

Austriker commented 8 years ago

@GBJim have you found how to do it ?

if I have understood well when you train fast-rcnn you have to run selective_search before to create some background proposal. But with faster-rcnn the RPN layer does it by itself. So technically speaking we don't need to add negative examples.

I started to train on my set with rcnn_alt_opt but rpn_loss_bbox is really unstable ! it moves between 12 and 0.3. What should I do ?

GBJim commented 8 years ago

Hi @Austriker I am not sure what's the solution of your unstable rpn_loss_bbox. The only thing I know is that the whole alternating training takes a long time and repeats several times(correct me if I am wrong) Maybe you need higher iterations or more training data to solve your problem.

About the negative examples issue. My situation is that half of my training data does not contain any foreground objects(technically it contains lots of _background object), which implies that Faster R-CNN ignores half of the data during training.

If there is a way to enable Faster R-CNN take those negative half of the data, it must be helpful to improve the model performance. This is my assumption. I am still working on how to enable Faster R-CNN doing that.

Austriker commented 8 years ago

@GBJim I have the same issue with my dataset. Every image with no bbox is removed from the set. I was thinking of using selective_search to generate background boxes and merged the roidb to the dataset.

GBJim commented 8 years ago

@Austriker That sounds like a good idea! I am not familiar with selective search tools cause I stepped into this research field directly from Faster R-CNN. Any suggestions or tutorial materials to get start selective search?

manipopopo commented 8 years ago

The roidb without any RoIs will be removed from training data set by filter_roidb. We can comment out this check on RoIs to allow background images.

For RPN, anchor_target_layer will calculate the overlapping areas (intersection-over-union, IoU) between anchors and ground truth bounding boxes to find out which anchors are negative example (having low IoU with ground truth boxes). Since background images don't contains any bounding boxes, these lines can be skipped. All anchors for background images are either negative or don't care.
For rcnn classification sub-network, roi_data_layer will check the IoU between proposals and ground truth boxes to determine whether a proposals is positive examples or a negative examples. Since background images don't contains any bounding boxes, these lines can be skipped. All proposals for background images are either negative or don't care.

Note that you may want to choose the anchors with higher box scores as hard negative examples for RPN. Besides, the hyperparameters may needed to be tuned on account of the growing number of negative batches (batches without any positive examples).

I got a floating point exception(C++) during the training process.

py-faster-rcnn call top.reshape in several layers. Calling blob.reshape([..., 0, ...]) leads to a floating point exception. Any chance of your data layers generating empty batches?

GBJim commented 8 years ago

@manipopopo Thank you for sharing this detailed information!

My goal is to enable Faster R-CNN learns from those background images(no bounding box) So I think I can not skip those lines in anchor_target_layer and roi_data_layer. I think skipping these lines is equivalent to directly removing background images from training data.

Instead, I need to generate(maybe randomly) some bounding boxes for background images and feed them into Faster R-CNN

Austriker commented 8 years ago

@GBJim For the selective_search : http://koen.me/research/pub/uijlings-ijcv2013-draft.pdf Python version : https://github.com/AlpacaDB/selectivesearch

@manipopopo Thanks it's very interesting.

I think the solution should be to tweak the filter_roidb to avoid removing images without any bbox.

manipopopo commented 8 years ago

@GBJim

All anchors are still negative examples for the training in RPN as we don't remove these lines. And the RPN still processes background images and generates proposals. These proposals can be negative (__background__) training examples for rcnn classification sub-network as long as your imdb's method rpn_roidb doesn't get rid of records corresponding to background images .

I need to generate(maybe randomly) some bounding boxes for background images

Do the randomly generated bounding boxes play the similar role with RPN proposals?

GBJim commented 8 years ago

@manipopopo A-ha! I think I understand what you mean now. If those lines in anchor_target_layer and roi_data_layer are skipped, the negative anchors and proposals can be generated without annotating any background image. Am I correct?

Austriker commented 8 years ago

@GBJim I think you just need to edit the filter_roidb function and it will do.

GBJim commented 8 years ago

@Austriker Let me try :+1:

manipopopo commented 8 years ago

@GBJim

If those lines in anchor_target_layer and roi_data_layer are skipped, the negative anchors and proposals can be generated without annotating any background image. Am I correct?

In this case, yes.

As @Austriker said, you may want to modify or comment out filter_roidb.
Those lines in roi_data_layer and anchor_target_layer are skipped since we don't have ground truth boxes. No bbox_overlaps needs to be estimate, and no positive (or fg) example exists.
Make sure that rpn_roidb of your imdb doesn't get rid of records corresponding to background images.
Make sure that there's no chance an empty batch flow through the network.
Remember that the hyperparameters may needed to be tuned on account of the growing number of negative batches (batches without any positive examples). Besides, you may want to choose the anchors with higher box scores as hard negative examples for RPN. See Training Region-based Object Detectors with Online Hard Example Mining for the concept of hard example mining. The implementation of hard example mining for rcnn classification sub-network can be found at R-FCN.

GBJim commented 8 years ago

@manipopopo I try to follow your instructions. But I encounterd KeyError: 'boxes' in imdb.py line 106

In my customized imdb class, an empty dictionary is returned when the process is asking for the annotation of negative examples. Should I simply insert a skipping command in imdb line 106 to avoid this error?

manipopopo commented 8 years ago

The flipped roidb entry of self.roidb[i] and self.roidb[i] are almost the same thing, except that entry['boxes'] contains flipped self.roidb[i]['boxes'] and entry['flipped'] is True.

Since there is no ground truth box in a background roidb[i] , the only thing we have to do is make sure the flag entry['flipped'] is set to True. You can do whatever you want as long as the structures of normal entries and flipped entries are consistent.

GBJim commented 8 years ago

@manipopopo I am reading those lines you suggest to skip, but I am wondering if that is appropriate. My training data contains both positive and negative examples.
It seems like your modifications will handle all training data as negative examples Is that correct?

manipopopo commented 8 years ago

Oh, I meant the lines are skipped when there is no ground truth bounding box. For example, if len(gt_boxes): do the following lines.

GBJim commented 8 years ago

Hi @manipopopo Following your instructions, I encountered an error in line 117 of anchor_target_layer.py bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :])

NameError: global name 'argmax_overlaps' is not defined

Because line 132 to 162 are skipped for negative examples, argmax_overlaps are not defined I am still thinking how to solve it correctly. Any idea about it?

And by the way, it seems like line 93 to 103 of minibatch.py which you suggest to skip, doesn't matter. Because the parent function _sample_rois will be called only when cfg.TRAIN.HAS_RPN set to false

GBJim commented 8 years ago

@manipopopo I am stuck at line33-44 of minibatch.py It seems like get_minibatch only returns image and positive bounding boxes in the blob dictionary I tried to assign negative examples a zero array, which is np.zeors((1,5), dtype=np.float32), and I got Floating point exception

manipopopo commented 8 years ago

Since there is no ground truth boxes (gt_boxes) within background images, the bbox_targets, bbox_inside_weights and bbox_outside_weights can be set to zero arrays directly, without calling bbox_targets = _compute_targets(anchors, gt_boxes[argmax_overlaps, :]). Only positive RPN examples and positive rcnn classification sub-network examples can have non-zero smooth L1 loss.
_sample_rois in minibatch.py is used to generate training data of rcnn classification sub-network for alternating training strategy. If you are experimenting with end-to-end training strategy, you can safely ignore it.
Give blobs['gt_boxes'] a zero array is OK. (However, I will use [0, 0, 0, 0, DUMMY_VALUE] to let me distinguish between real ground truth boxes and the dummy box with the last element being DUMMY_VALUE) floating point exception can be attributed to the function call blob.reshape(..., 0, ...). Do the top blobs of your roi_data_layer, anchor_target_layer and proposal_target_layer all have valid shapes?

GBJim commented 8 years ago

@manipopopo Thank you for the explanations. I am wondering about the DUMMY_VALUE. Isn't this value always zero? Since the last value of blobs['gt_boxes'] represents the class and the classes from negative examples must be zeros. Is it correct?

manipopopo commented 8 years ago

For me setting DUMMY_VALUE to 0 means the box is a __background__ bounding box provided from the data set. So I'll set DUMMY_VALUE to -1 to mark the box as a dummy record, which is created to prevent the top blob from being empty. You can set DUMMY_VALUE to whatever you like as long as you can distinguish the dummy box from ground truth object proposals.

GBJim commented 8 years ago

Thank you @manipopopo ! My training process is working right now. If my testing results get improved, I will share my modifications in here :)

GBJim commented 8 years ago

@manipopopo I am reading the paper you referenced earlier: Training Region-based Object Detectors with Online Hard Example Mining. If my understanding of hard example mining is correct, the idea is to intentionally sample proposals with higher losses(hard examples) during the training process, and use these selected proposals to implement backpropagation.

This is a promising approach, but how to implement it? As you mentioned earlier:

you may want to choose the anchors with higher box scores as hard negative examples for RPN

But I don't have the loss value for each anchor (or proposals). The final loss function is calculated in Caffe SoftmaxWithLoss layer SmoothL1Loss layer.

I guess I need to customize a new loss layers to integrate with RPN layer in order to implement Online Hard Example Mining of the pape. Is this correct?

I am going to research another material R-FCN you referenced for more detail :)

Really appreciate all the information you provide!

GBJim commented 8 years ago

Hi all: After the verifying the results of Negative-Enabled training, I sadly found out the precision is lower than just ignoring all negative examples.

As @manipopopo suggested, I modified the minibatch.py to balance the negative and positive examples in the training process. Continuous negative examples may lead the SGD descend to the wrong direction. I am doing new experiments to verify this.

manipopopo commented 8 years ago

When iter_size=2 and the update of parameters are evaluated on one normal image, which contains at least one ground truth object bounding box, plus one background image, you could increase cfg.TRAIN.RPN_FG_FRACTION (cfg.TRAIN.FG_FRACTION) and lower cfg.RPN_TRAIN.BATCH_SIZE (__C.TRAIN.BATCH_SIZE) to balance the amounts of foreground and background training examples manually.

However, tuning hyperparameters might be a time-consuming process. If your GPU memory is large enough, you can try concatenating foreground images with background images, and trains the network on the concatenation results with larger cfg.TRAIN.MAX_SIZE and cfg.TRAIN.SCALES.

Besides, you can try doing stage 3 and stage 4 in alternating training strategy for some epochs after the end-to-end training.

xiaoxiongli commented 8 years ago

@GBJim what is your result now?

GBJim commented 8 years ago

@xiaoxiongli Unfortunately, the precision is damaged by Negative-Enabled training. In detail, my training process does not balance the ratio of positive and negative input images.

I guess that a 10 : 1ratio of Positive : Negative can be a good start to see if Negative-Enabled training can really improve the overall performance.

Currently, I am working on another project. If you want to experiment it, just follow @manipopopo's instruction in this thread :)

murphypei commented 7 years ago

This issue is very useful, @GBJim , Thank you !

ArturoDeza commented 7 years ago

A question in general that I have is: am I supposed to just not include the images that don't have the class in training if I am doing pedestrian detection (for example?). I would assume that this isn't a good idea. Moreover, what should the .xml file of the non-pedestrian (or class/target present) image look like?

<?xml version="1.0" encoding="utf-8"?>
<annotation>
   <filename>
      <item>./SCORCH_Stimuli/Set1_XML/1.png</item>
   </filename>
   <folder>scorch</folder>
   <object>
      <bndbox>
         <xmin>144</xmin>
         <ymin>547</ymin>
         <xmax>169</xmax>
         <ymax>585</ymax>
      </bndbox>
      <name>background</name>
   </object>
</annotation>

It also seems like putting a bounding box will limit what is the negative class search space (one bounding box, per negative image, vs 100 in the image). I see there has been some discussion on this already in the thread, but I am wondering how the .xml files play a role in the negative classes.

GBJim commented 7 years ago

@ArturoDeza You need to modify the code to accept the negative images. In my case, if an image does not have corresponding annotation(a negative image), it's bounding boxes data will be assigned to some certain value(ex: None). Then, you still to modify the following process to handle these exceptional bboxes.

This thread has the detail of how to modify the codes.

paviddavid commented 5 years ago

@GBJim I have the same issue like you had. I want to train the network with i.a. images that does not contain any annotation. I read the thread but do not understand every aspect.

Could you summarize the steps to change the code step-by-step? It would be very helpful to all the people who have similar issues with their own dataset. Thanks in advance.

rbgirshick / py-faster-rcnn

Annotations of negative examples. #231