Replicating paper results

thuml / OpenDG-DAML

Code release for Open Domain Generalization with Domain-Augmented Meta-Learning (CVPR2021)

32 stars 6 forks source link

Replicating paper results #2

Closed FrancescoCappio closed 2 years ago

FrancescoCappio commented 3 years ago

Hello! I am really interested in the Open Domain Generalization setting that you propose in your paper, as I think it is really useful for real world applications. I am therefore trying to replicate the results of your method following the instructions here in the repository. I trained all models on the OfficeHome Dataset and then tested them.

The output I obtain (e.g. for the shift ACP->R) is:

Best OverallAcc: 60.19 Best threshold Acc: 0.495 Best OverallHscore: 60.87 Best threshold Hscore: 0.495 Best OverallCaa: 59.71 Best threshold Caa: 0.596

If I understand correctly the "best overall accuracy" is what you print in the "Acc" column of table 3 of the paper. While the "best overall H-score" is what you print in the "H-score" column. Is this correct?

sy565612345 commented 2 years ago

Hello, this is the test results of the whole target domain (including known classes and unknown classes), and the best overall H-score is the H-score in the paper.

However, the Acc in the paper is the result only on the known classes in the target domain, which is printed during training (when you run daml.py).

FrancescoCappio commented 2 years ago

Thank you very much for your answer! I do have some other questions. So if I understand correctly in order to replicate the paper accuracy (Acc) (which is on known classes only) I should look at the line best_test_acc1 =... which is printed at the end of training, right?

One other question: this known classes accuracy (Acc) is computed without looking at the confidence score and so it is not the same known classes accuracy which is used to compute the H-score (called insider in validate.py)? I mean the formula for H-score is: H-score = 2(known_accunk_acc)/(known_acc+unk_acc)

But the known_acc value in the formula (insider) is obtained by selecting predictions with confidence higher than the threshold (and unk_acc is obtained by selecting predictions with confidence lower than the threshold, and is called outsider in validate.py) while for the known accuracy you put in the paper (Acc) you do not use any confidence threshold right?

sy565612345 commented 2 years ago

Q1: We report 'best_val_test_acc1' in the code, which uses the checkpoint that achieves the highest acc on the held-out validation set, and then tests it on the target domain. It is printed in line 209 of the code and is shown as 'Mean validation acc...' in the line before 'best_val_acc1' and 'best_test_acc1' in the training log. The 'best_test_acc1' uses the checkpoint that achieves the highest acc on the target domain, which may be similar or slightly higher than 'best_val_test_acc1'.

Q2: H-score is used in situations when unknown classes exist. In this real situation, we need to identify first whether the sample is from unknown or known classes, and then classify the known classes into specific labels. And the acc of known classes here is different from that in Q1 (For example, some known class examples may be incorrectly classified as unknown because of the low confidence here).

FrancescoCappio commented 2 years ago

Ook! Thank you very much, now everything is clear! I just have one last question: I noticed in validate.py that you divide the range of confidence values in 10 intervals and then you evaluate H-score with 10 threshold values. At the end you print the highest H-score. I was wondering if you really find this strategy good considering that of course you cannot apply it on unlabeled target data, don't you have any suggestion on how to choose an appropriate threshold value for unlabeled data?

sy565612345 commented 2 years ago

Thank you for the valuable advice, my idea is that we can choose a certain percentile of the confidence on the held-out data or maybe we can use some additional data as an agent for outliers, I think it remains an open problem.

FrancescoCappio commented 2 years ago

Ok thank you very much!