unclear parts between overall objective in the paper and in the code

AutoFine commented 1 year ago

Hi, thanks for the genius ideas in your paper. I come from domain adaptation background, but in image classification area. After I did some literature research in object detection/Faster-RCNN, I would like to have a deeper understanding of why your method works. That's why I need to ask a few detailed questions.

in the self-training process, where did the initial knowledge of feature extractor F come from to further generate pseudo-labels at RPC (which will be used for RPN)? My assumption would be trained model parameters of source data. That's why in the DA training, frist comes the target training, second comes the source training.
regarding RPN weighted alignment part, is this domain_cls_loss the L_adv mentioned in the paper? Otherwise, I don't find any alignment of source and target domain with RPN score.
in the paper, overall loss for target domain has functions. Is it correct that the overall functions for target domain have been divided into three parts (that are loss_cityscapes_pl, discrepancy_loss_min and discrepancy_loss_max) and been back-propagated separately? That' why loss_cityscapes_pl (in the code) = L_adv_t + alpha L_rpn_t + beta L_cls_t (in the paper)?
regarding maximize discrepancy classifier on detectors part, there are two levels of discrepancy, one is between RPN and RPC, another one is between feature extractor and these two classifiers. The feature extractor is to maximize L_MCD. It's not explained why and how feature extractor would connected to foreground/background discrepancy between RPN and RPC. Yes, same formula is used for the feature extractor to be maximized. But, it doesn't make sense to me, why/how the high weighting of foreground/background ROIs between RPN and RPC could connect to the feature extractor. I don't understand how the optimization between the feature extractor and two classifiers happen.

Many thanks in advance

abucka commented 1 year ago

Hi, I would also be interested in the answers to the above questions. Additionally, could you please answer the following questions:

Regarding the cityscapes_to_foggycityscapes_da.py script: Does the domain_cls_loss.mean() in the code refer to L_adv_t in the paper? And, why when epoch bigger 4 and when rpn_loss_cls is zero, loss_cityscapes_pl equals to domain_cls_loss.mean()? And, why this means skipping target? This part isn’t explained in the paper.
I would like to increase batch_size for my own training. However, I got error when I change batch_size from 1 to other value. Can you please explain why batch_size is fixed to value 1?

I would be very grateful if you could answer these questions. Thank you very much for your great work!

uitrbn commented 1 year ago

Yes. The initial knowledge comes from the ImageNet pretrained backbone and source training. But the order of source/target training in each step in cityscapes_to_foggycityscapes_da.py is arbitrary.
Yes.
Yes.
The feature of each ROI is extracted by the backbone and then processed by the ROI align module, therefore, the final loss between the RPC and RPN can be backwarded to the backbone. It is similar to the regular detection loss.
epoch 4 is a parameter to stable the training because the pseudo labels generated by the source-trained model are inaccurate at the beginning, so we hope to perform domain alignment first. Skipping target is performed when rpn_loss_cls has the same type of zero, which indicates that there is no pseudo label (ROI) for this target input.
No special reason. I set the batch size to 1 because it fits my GPU memory and is easy to deal with the code logic.

uitrbn / CST_DA_detection

unclear parts between overall objective in the paper and in the code #4