yuhuayc / da-faster-rcnn

An implementation of our CVPR 2018 work 'Domain Adaptive Faster R-CNN for Object Detection in the Wild'
Other
336 stars 70 forks source link

Weight (lambda) value for image level adaptation and hyperparameter request #17

Open sehyun03 opened 5 years ago

sehyun03 commented 5 years ago

Hi I found weight value for image level adaptation loss on "train.prototext" set to 1.0, which is not consistent with your paper(all lambda set to 0.1).

layer {
  name: "da_conv_loss"
  type: "SoftmaxWithLoss"
  bottom: "da_score_ss"
  bottom: "da_label_ss_resize"
  top: "da_conv_loss"
  loss_param {
    ignore_label: 255
    normalize: 1
  }
  propagate_down: 1
  propagate_down: 0
  loss_weight: 1
}

Also "lr_mult" for instance level domain classifier have 10 times more value than other conv of fc.

layer {
  name: "dc_ip3"
  type: "InnerProduct"
  bottom: "dc_ip2"
  top: "dc_ip3"
  param {
    lr_mult: 10
  }
  param {
    lr_mult: 20
  }
  inner_product_param {
    num_output: 1
    weight_filler {
      type: "gaussian"
      # std: 0.3
      std: 0.05
    }
    bias_filler {
      type: "constant"
    }
  }
}

Can you provide exact hyperparameters on "loss_weight", "lr_mult", "gradient_scaler_param" you used on your paper? It would be appriciated to get the hypereparameters for each setting(image level DA, image + instance level DA, image + instance level DA + consistency loss) and dataset(_sim10k->cityscapes, cityscapes->citysacpesfoggy, kitty <-> cityscapes). Thank you.

JeromeMutgeert commented 5 years ago

Hi,

I am trying to get familliar with the code too. I came to similar questions. I think you can find your lambda in the GradientScaler layers that implement the GRL's. They scale the gradient with a factor -0.1, which effectively results in the right gradient, at least for the FRCNN-part of the network. The DA-part has a loss of (L_img + L_ins + L_cst), without minius, and without factor lambda. I think this is desirable for the training of the adversarial (DA) part.

However, what I just said is not consistent with the rest of the code, because the L_ins has a gradient scaling factor in the GRL of -0.1, ánd a loss weight of 0.1 at the dc_loss output. I think the latter factor 0.1 is cancelled out by the learning rate multipliers in the corresponding layers in between. But anyway, when its gradient is mixed with the FRCNN loss, it thus seems to only be weighed in with a factor 0.01.

About the L_cst, I have not found any code for that in this repository. I think you will need to use the caffe2 implementation for that, see https://github.com/yuhuayc/da-faster-rcnn/issues/4