justinkay / aldi

Official implementation of "Align and Distill: Unifying and Improving Domain Adaptive Object Detection"
https://aldi-daod.github.io/
37 stars 6 forks source link

Evaluation is not same with the outcome in the paper #30

Closed sshhj89 closed 2 weeks ago

sshhj89 commented 3 weeks ago

Hi,

Thank you for your research and I really like your achievement and sharing.

I evaluated the model "cityscapes_baseline_strongaug_ema_foggy_val_model_best_591_ema2model.pth" with this command "python tools/train_net.py --config configs/cityscapes/Base-RCNN-FPN-Cityscapes_strongaug_ema.yaml --eval-only"

I got the result in the screenshot (foggy AP50 is 60.258) Screenshot from 2024-08-21 23-56-31

I though it was trained with strong augmentation and EMA so it should be similar with the AP 50 (64.3) in the table 3(a). Is the model the result with weak augmentation not strong augmentation? could you help with this?

justinkay commented 2 weeks ago

Hi @sshhj89 thanks for your interest.

Two comments -

  1. This model, cityscapes_baseline_strongaug_ema_foggy_val_model_best_591_ema2model.pth is the source-only model trained with strong augmentation and EMA. The associated AP50 in the paper for this model is 59.1. The 64.3 result you refer to is for training the (domain adaptive, not source-only) model with the base settings in Table 2 plus color jitter and MIC augmentations.
  2. Regarding why you are seeing 60.25 instead of 59.1, this could be because Detectron2's evaluation optionally uses a custom (but unofficial) implementation of COCOeval that is faster. I checked, and our results are reported using this implementation. Can you check your log for something like this:
[06/03 16:48:38] d2.evaluation.coco_evaluation INFO: Evaluating predictions with unofficial COCO API...
[06/03 16:48:38] d2.evaluation.fast_eval_api INFO: Evaluate annotation type *bbox*

If you do not see this, that would explain the discrepancy.

sshhj89 commented 2 weeks ago

Hi @justinkay Thank you for reply.

I feel the discrepancy between 60.25 and 59.1 was because of lower learning rate or different machine configurations I am using. I am trying to reproduce the result of strong augmentation and EMA without domain adaptation (64.3 AP50 on FCS, without L_distillation loss).

Here is my understanding so please correct if you are available.

Table 2 shows outcomes of utilizing different methods as ablation studies. To recap, your team achieved 64.3 AP50 with strong augmentation and EMA without domain adaptation (L_distillation loss) and achieved 64.0 with L_distillation loss without strong augmentation. Finally, you combined strong augmentation, EMA, and L-distillation loss to obtain 66.3 AP50 on FCS.

I wonder how to get the separate results? or how to setup the config files. With this config file below, can I reproduce it?

BASE: "../Base-RCNN-FPN.yaml" MODEL:

WEIGHTS: "output/cityscapes/cityscapes_baseline_strongaug_ema/cityscapes_foggy_val_model_best.pth"

ROI_HEADS: NUM_CLASSES: 8 AUG: # hojun add for strong augmentation MIC LABELED_INCLUDE_RANDOM_ERASING: True UNLABELED_INCLUDE_RANDOM_ERASING: False LABELED_MIC_AUG: True UNLABELED_MIC_AUG: False DATASETS: TRAIN: ("cityscapes_train",) TEST: ("cityscapes_val", "cityscapes_foggy_val",) BATCH_CONTENTS: ("labeled_strong", ) EMA: ENABLED: True SOLVER: STEPS: (9999,) MAX_ITER: 10000 CHECKPOINT_PERIOD: 100 OUTPUT_DIR: "output/cityscapes/cityscapes_baseline_strongaug_ema/"

or do I have to add these configurations below into the base config file (maybe all distillation should be "False")?

DOMAIN_ADAPT: TEACHER: ENABLED: True DISTILL: HARD_ROIH_CLS_ENABLED: False HARD_ROIH_REG_ENABLED: False HARD_OBJ_ENABLED: False HARD_RPN_REG_ENABLED: False ROIH_CLS_ENABLED: True OBJ_ENABLED: True ROIH_REG_ENABLED: True RPN_REG_ENABLED: True

Could you help with that?

justinkay commented 2 weeks ago

Hi @sshhj89 I think there is some confusion in interpreting Table 3. This is probably my fault for including the source-only results in those tables but being inconsistent in which source-only model I was using for comparison. We have a revision coming soon that addresses this.

Let me see if I can explain line by line:

Table 3a

Source-only model 51.9 - This is the model trained on source-only data without domain adaptation, and without strong augmentations and EMA. It is just included as a point of reference.

Weak (scale & flip) 52.6 - + Color jitter 59.0 + Color jitter + Erase 63.1 + Color jitter + MIC 64.3 These models are trained with domain adaptation, ablating the set of strong augmentations used on target data during domain adaptation.

Table 3b

Source-only model 59.1 - This is the model trained on source-only data without domain adaptation, and with strong augmentations and EMA. It is just included as a point of reference.

These results are with domain adaptation, using the settings from Table 2 except replacing hard distillation with soft distillation for the last row.

Table 3c

Source-only model 59.1 - This is the model trained on source-only data without domain adaptation, and with strong augmentations and EMA. It is just included as a point of reference.

These results are with domain adaptation, using the settings from Table 2 except using either L_align, L_distill, or both.

Regarding what has changed between the Table 3 results and our best results of 66.8 on FCS -- essentially, our best model ALDI++ is chosen to use the best settings indicated by these ablations. So: strong augmentations + EMA during burn-in; strong augmentations on source data; strong augmentations including MIC on target data. Hope this helps clarify.

Thank you for your close reading, this will help us improve the paper :)

justinkay commented 2 weeks ago

Closing for now - do let us know if you have any further questions.