Consistency Loss not used for detection

Hi, thanks for your question! For the calling of consistency loss, our implementation is based on SADA framework instead of the prior da-faster framework. The implementation of consistency loss between SADA and da-faster is slightly different. The original call is applied on da_img_consist and da_ins_consist, while the new L1 loss is applied on da_img_rois_probs_pool and da_ins_consist. To align the image-level features with da_ins_consist, the original call will compute the average of da_img_consist among all spatial locations while the new implementation will compute ROI pooled da_img_consist. In practice, the latter one should be more stable than the former one.

For applying MIC on object detection, we mask the target images randomly and generally it would mask the foreground objects as well as background regions. Enforcing the consistency between student predictions and the pseudo labels produced by teacher would alleviate the misclassfication for background and improve the recognition of occluded foreground objects. As the masking is performed randomly, there does exist a chance that some foreground objects are completely masked. However, this risk also exists in semantic segmentation task and the image classification task. The experiments and analysis in the paper suggest that the enhancement of context learning brings more improvement compared to the possible loss of masking objects.

lhoyer / MIC

Consistency Loss not used for detection #49