krumo / Detectron-DA-Faster-RCNN

Domain Adaptive Faster R-CNN in Detectron
Apache License 2.0
50 stars 15 forks source link

Few questions for reproducing the results #6

Closed BOBrown closed 5 years ago

BOBrown commented 5 years ago

Hi, I'm trying to reproduce your work for Sim10k->Cityscapes dataset but have several issues. Q1: loss_da and loss_dc did not converge(loss_da=0.69,loss_dc=0.68) during the 70000 iterations training process. Is that the same for you? Did different optimization objectives between L_det and L_img+L_ins+L_cst lead to this result?

Q2: I reproduced 34.9 AP by using the image, instance, and consistency loss for Sim10k->Cityscapes task and config file e2e_da_faster_rcnn_vgg16-sim10k.yaml. Actually, I observed unstable results in different training sessions. Is it related to the optimization objective mentioned in Q1?

JeromeMutgeert commented 5 years ago

Note that adversarial loss is both minimized and maximized at the same time. The 'optimal' loss, at which the discriminator cannot distinguish anything useful any more, is ln(2) (about 0.69), but often it ends up lower than that because the discriminator can determine the domain sometimes. Generally this loss starts high, then drops quickly, and then converges rising to a value not far below ln(2), so your da loss looks fine, however, a bit too good maybe.

krumo commented 5 years ago

Hi @BOBrown , Q1: I think your loss seems okay. The target of adversarial training is to learn a domain invariant feature, which means that the feature is expected to confuse a domain classifier. So a relatively higher loss_da is what we want. Q2: Unfortunately, the adversarial training is generally unstable. It's highly possible that your best model doesn't occur in the final iteration. How to make the adversarial training more stable is still an open research question.

gabridego commented 3 years ago

Hi @krumo @JeromeMutgeert . Sorry for commenting this issue after such a long time, but I was wondering if there is any specific reason for the domain adaptation loss converging to ln(2) at both image and instance level. Can this behavior be explained theoretically? Thanks so much.