Weakly-supervised object detection

burakceng commented 5 years ago

Hello, there!

I was going through some research papers on Weakly-supervised object detection (no bounding boxes in ground truths, only labels). I wonder if I can apply it to this repository. I was thinking of doing the following at least (I guess):

I need to prepare a CustomDataset (gonna use COCO and VOC '07 and '12)
I need to add my own BBoxHead, let's call it WeakBBoxHead, which will not care about the bbox regression losses

Up until this point I was stuck at several places about this bbox regression loss cancellation.

Which files should I change? or What kind of new modules I should add? I think one of them is WeakBBoxHead module, maybe a WeakRPNHead as well?
Or is there an eaiser way to do that (e.g. setting requires_grad = False)?

The papers I read are https://arxiv.org/abs/1511.02853 , https://arxiv.org/abs/1704.00138 .

Thanks for your efforts for this repo.

Kind regards,

Burak

hellock commented 5 years ago

That depends on the pipeline you adopt. If you want to implement different data preprocessing, then you need to write a new dataset (which can also inherit from existing ones), If you only change the box prediction, you need to write another bbox head. If you change the whole pipeline, then you need to implement a new detector.

BurakHocaoglu commented 5 years ago

Now, I am planning on constructing my dataset on top of the original COCO dataset by adding another field gt_counts. I think its easy to integrate it to the dataset loader of pytorch; however, the signatures of forward() methods of models seem to be fixed and if I try to add another keyword argument to, for example, BBoxHead.forward() method, I won't be able to make it work at all. So my question is, where can I find the line of code that calls the forward method of a specific detector model (e.g. FasterRCNN.forward())?

The training api shows a model(**data), which I believe serves for what I am asking, but it is closed, so I cannot get into it. Probably, I'm missing something and I accidentally put myself into a messy situation, because of the details. So, in other words, how can I pass my new field of annotation to my training, especially when computing losses?

Sorry for bad explanation and thanks in advance...

Burak

burakceng commented 5 years ago

I guess I dealt with it. Closing...

burakceng commented 5 years ago

Hello again,

Sorry for re-opening this issue, but opening another issue would be more messy. This time I have a couple questions regarding the mechanics of training rather than implementation.

First of all, I have 2 custom datasets, namely WeakCocoDataset and WeakVocDataset. Normally, they should have only the labels that I am going to give, but I thought, for simplicity of mAP calculation, I should have the original label and bbox annotations intact; therefore, both datasets are pretty much the same as their fully-supervised counterparts with an addition of my labels. I am not gonna use any of those annotations in my training, though. Only the one associated with my task is used in training, which is instance counts of each class in the dataset in a particular image.

Secondly, I have a custom pipeline which resembles Faster R-CNN pretty much, but its trained with weak supervision and the implementation follows accordingly. Short overview of my model:

Backbone: ResNet 50, caffe type, with BN and requires_grad = False
A shared head - a ResLayer
RPNHead - because of weak supervision, rpn_cls_loss and rpn_reg_loss values are not computed and introduced to backpropagation as that would require ground truth bboxes, so only outputs are used, there is nothing fancy added to it.
BBox RoI Extractor - a regular single level, no change
WeakBBoxHead - This one is quite similar to regular BBoxHead; however, the bbox losses are not computed. There are 2 variants: i) one with an LSTM + FC attached to the end of it, ii) a naive thresholding and summing over dim=0 with e.g. a tensor of size 2000 x 81, 2000 boxes, 81 classes for COCO's case with +1 is for background.
There are 2 losses, one for classification and 1 for regression. I am using my per class counts label in MSELoss with the predicted counts; hence, regression, whereas for the classification I change the label provided by original COCO to a one-hot encoding, by hand, by following that if a class is in the image, then its corresponding label is 1; otherwise, it is -1. Right now I use torch.nn.functional.cross_entropy(reduction='none') loss directly as I don't have weights from bboxes.
The accuracy is calculated only from count information, if the predicted class instance count is within a +/- 0.5 range of the target count, then I take it as correct (this is actually problematic when the target count is low, e.g. 1, maybe the range should've been adaptive, I don't know)

My problems are:

The count accuracy starts too high, but the explanation in #657 seems quite logical as COCO dataset is not a very dense dataset as I recall, neither is VOC; hence, many negative samples.
The classification loss does not decrease at all. I tried torch.nn.MultiMarginLoss(p=1, margin=1.0), but loss stayed at the same value from the start, tried cross_entropy with all values of reduction parameter, but did not help (currently using it as you set it, i.e. 'reduction=none'). I implemented the energy function proposed in https://arxiv.org/abs/1511.02853 , but hilariously gave me nan values as it should not by design of the model and the nature of the function - I guess I got very small values and taking log() of them resulted in something like infinity; hence, nan, then this made me question whether the gradient clipping is actually working or not.
I actually did not try any mAP testing, but the behaviour of loss values worries me.
Currently, I am trying to train with learning rates around 1e-6.

Any advices ??

Sorry for that adventurous explanation and if I created plot holes.

Thanks in advance...

Burak

open-mmlab / mmdetection

Weakly-supervised object detection #667