Target set labels are used during training

JeromeMutgeert commented 5 years ago

Hi @krumo,

I've been trying to get the original code in caffe 1.0 by @yuhuayc working, and now I am switching to your code because it is better and has DA consistency loss code. I am trying to train a domain adaptive detector using an unlabeled target set. This has posed some problems because an empty bounding box list in the labels is not accepted by the code. By adding dummy labeling (a single arbitrary bbx) all problems were resolved in the caffe 1.0 implementation, but in the current implementation I came to some extra problems.

The main issue is that the training process in your implementation does not seem to be independant of the target set labels. They are used by the GenerateProposalLabels python layer for making a selection of 256 out of 2000 rpn proposals. I would regard this as cheating, because this makes sure that the L_ins is focussed on the most relevant rois, selected with respect to the ground truth labels.

In the original implementation by @yuhuayc this is avoided by feeding, besides the selection of 256(128) rois, all 2000 rpn proposals into the RoiPooling down to the bottleneck features, to the L_ins. This is then done for both source and target images. (https://github.com/yuhuayc/da-faster-rcnn/blob/master/models/da_faster_rcnn/train.prototxt here 'concat_rois' is defined as input for roi pooling, and after the bottleneck features they are split again in the 'slice_feats' layer.)

(This might also explain why you have changed lambda to 0.2; there are roughly 8 times less instances in the sum of L_ins.)

I've trained a network with source: coco_2017_train + target:unlabeled_imgs(with arbitrary bbxes) and validated it on coco_2017_val. The overall AP of 0.141 seems quite low. It could be because of my unlabeled image set.

Also, do you perhaps know what code is used to produce the original results (inc L_cst) in the paper? @yuhuayc is reffering to your repo for that. Is it?

krumo commented 5 years ago

Hi, @JeromeMutgeert Thanks for your suggestion! You're right. The rois sampling strategy does matter for sim10k->cityscapes adaptation task. This bug has been fixed in the latest commit. Actually when I develop the program I use the whole image as the bounding box instead of random generated ones. Any further comments are welcome. Plus, unfortunately, I also have no idea about the original code for L_cst.

Baby47 commented 5 years ago

Hi,@JeromeMutgeert Thanks for your analysis in this project. I am confused about why the e2e_da_faster_rcnn_vgg16-sim10k.yaml is as following: TRAIN: DOMAIN_ADAPTATION: True DA_IMG_GRL_WEIGHT: 0.2 DA_INS_GRL_WEIGHT: 0.2 WEIGHTS: models/vgg16v2.pkl SOURCE_DATASETS: ('sim10k',) TARGET_DATASETS: ('cityscapes_car_train',) SCALES: (800,) MAX_SIZE: 1600 IMS_PER_BATCH: 2 BATCH_SIZE_PER_IM: 256 RPN_MIN_SIZE: 16 TEST: DATASETS: ('sim10k','cityscapes_car_val',) The dataset 'sim10k' is both used in TRAIN and TEST DATASET. When I run test_net.py, two dataset in the test configuration are tested, however the first dataset 'sim10k' which is used to train got a really low mAP. I wonder why this happen. Generally speaking, the map result in this paper means the result of cityscapes_car_val (from sim10k to cityspace). Can you give me a specific explanation?

Thanks

krumo / Detectron-DA-Faster-RCNN

Target set labels are used during training #3