Domain Adaptive Faster R-CNN for Object Detection in the Wild

Metadata

Authors: Yuhua Chen, Wen Li, +2 authors Luc Van Gool
Organization: Computer Vision Lab, ETH Zurich & VISICS, ESAT/PSI, KU Leuven
Conference: CVPR 2018
Paper: https://arxiv.org/pdf/1803.03243.pdf
Code: https://github.com/yuhuayc/da-faster-rcnn

Motivation

Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop.
Two domain shifts are tackled: (1) image level adaptation (2) instance level adaptation.

Contributions

We provide a theoretical analysis of the domain shift problem for cross-domain object detection from a probabilistic perspective.
We design two domain adaptation components to alleviate the domain discrepancy at the image and instance levels, resp.
We further propose a consistency regularization to encourage the RPN to be domain-invariant.
We integrate the proposed components into the Faster R-CNN model, and the resulting system can be trained in an end-to-end manner.

H-divergence Definition

The H-divergence [1] is designed to measure the divergence between two sets of samples with different distributions.
H-divergence definition:
h(.) is a feature-level domain classifier, if the error is high for the best domain classifier, the two domains are hard to distinguish, so they are close to each other, and vice versa.
To align source and target domains, minimize the domain distance, which maximize the H-divergence.

Covariate Shift Definition

Notation: x: input; y: output; S: source domain; T: target domain; P: probability distribution.
P{S}(x) ≠ P{T}(x)
P{S}(y|x) = P{T}(y|x)

Domain Adaption Setting

Source images and labels are available.
Only target images are available.
Our task is to learn an object detection model adapted to the unlabeled target domain.
The setting is under the covariate shift assumption, where:
- Notation: C: class of the object; B: bounding box of an object;
- P{S} (C, B, I) ≠ P{T} (C, B, I)
- P{S} (C, B| I) = P{T} (C, B| I)
Image-level adaptation:
- In Bayes’s rule: P(C, B, I) = P(C, B| I) x P(I).
- Image level domain shift is caused by: P{S} (I) ≠ P{T} (I).
- Given an image, the detection results should be the same regardless of which domain the image belongs.
Instance-level adaptation:
- Again in Bayes's rule: P(C, B, I) = P(C|B, I) x P(B, I).
- Instance-level domain shift is caused by: P{S} (B, I) ≠ P{T} (B, I).
- Given the same image region containing an object, its category labels should be the same regardless of which domain it comes from.
Joint adaptation:
- Consider P(B, I) = P(B|I) x P(I)
- P(B|I) is assumed to be the same under covariate shift assumption.
- Thus if P{S} (I) = P{T} (I), we have P{S} (B, I) = P{T} (B, I)
- In other words, if the distributions of the image-level representations are identical for two domains, the distributions of the instance-level representations are also identical.
- Yet, it is generally non-trivial to perfectly estimate the conditional distribution P(B|I), since:
  - In practice it may be hard to perfectly align the marginal distributions P(I), which means the input for estimating P(B|I) is somehow biased.
  - The bounding box annotation is only available for source domain training data, therefore P(B|I) is learned using the source domain data only, which is easily biased toward the source domain.

Method

We propose to perform domain distribution alignment on both the image and instance levels, and to apply a consistency regularization to alleviate the bias in estimating P(B|I).

To align the source and target domain, train a domain classifier, thus we have 2 domain classifier:
- Notation: D denotes domain label.
- Image-level domain classifier: P(D|I)
- Instance-level domain classifier: P(D|B, I)
By Bayes’ theorem: P(D|B, I) P(B|I) = P(B|D, I) P(D|I).
By enforcing the consistency between two domain classifiers, i.e., P(D|B, I) = P(D|I), we could learn P(B|D, I) to approach P(B|I).

howardyclo / papernotes

Domain Adaptive Faster R-CNN for Object Detection in the Wild #35

Metadata

Motivation

Contributions

H-divergence Definition

Covariate Shift Definition

Domain Adaption Setting

Method

Further Readings: