Object detection typically assumes that training and test data are drawn from an identical distribution, which, however, does not always hold in practice. Such a distribution mismatch will lead to a significant performance drop.
Two domain shifts are tackled: (1) image level adaptation (2) instance level adaptation.
Contributions
We provide a theoretical analysis of the domain shift problem for cross-domain object detection from a probabilistic perspective.
We design two domain adaptation components to alleviate the domain discrepancy at the image and instance levels, resp.
We further propose a consistency regularization to encourage the RPN to be domain-invariant.
We integrate the proposed components into the Faster R-CNN model, and the resulting system can be trained in an end-to-end manner.
H-divergence Definition
The H-divergence [1] is designed to measure the divergence between two sets of samples with different distributions.
H-divergence definition:
h(.) is a feature-level domain classifier, if the error is high for the best domain classifier, the two domains are hard to distinguish, so they are close to each other, and vice versa.
To align source and target domains, minimize the domain distance, which maximize the H-divergence.
Given the same image region containing an object, its category labels should be the same regardless of which domain it comes from.
Joint adaptation:
Consider P(B, I) = P(B|I) x P(I)
P(B|I) is assumed to be the same under covariate shift assumption.
Thus if P{S} (I) = P{T} (I), we have P{S} (B, I) = P{T} (B, I)
In other words, if the distributions of the image-level representations are identical for two domains, the distributions of the instance-level representations are also identical.
Yet, it is generally non-trivial to perfectly estimate the conditional distribution P(B|I), since:
In practice it may be hard to perfectly align the marginal distributions P(I), which means the input for estimating P(B|I) is somehow biased.
The bounding box annotation is only available for source domain training data, therefore P(B|I) is learned using the source domain data only, which is easily biased toward the source domain.
Method
We propose to perform domain distribution alignment on both the image and instance levels, and to apply a consistency regularization to alleviate the bias in estimating P(B|I).
To align the source and target domain, train a domain classifier, thus we have 2 domain classifier:
Notation: D denotes domain label.
Image-level domain classifier: P(D|I)
Instance-level domain classifier: P(D|B, I)
By Bayes’ theorem: P(D|B, I) P(B|I) = P(B|D, I) P(D|I).
By enforcing the consistency between two domain classifiers, i.e., P(D|B, I) = P(D|I), we could learn P(B|D, I) to approach P(B|I).
Metadata
Motivation
Contributions
H-divergence Definition
Covariate Shift Definition
Domain Adaption Setting
Method
We propose to perform domain distribution alignment on both the image and instance levels, and to apply a consistency regularization to alleviate the bias in estimating P(B|I).
Further Readings: