tim-learn / SHOT

code released for our ICML 2020 paper "Do We Really Need to Access the Source Data? Source Hypothesis Transfer for Unsupervised Domain Adaptation"
MIT License
437 stars 77 forks source link

The reasons for locking the source classifier #22

Closed TimandXiyu closed 3 years ago

TimandXiyu commented 3 years ago

As I reproduced the results for PDA on Office-Home, I also tried to unlock the gradient for the source classifier. Interestingly, there doesn't seems to be much acc difference after letting the classifier being able to update its weights. The acc difference is within 0.5% and sometimes close to zero.

Therefore, I am assuming locking the classifier is mainly because the problem setup demands this and not because there is a theory states doing so could be more accurate? I did read the paper's 3.2 which mentions locking gradients for ht, but maybe I didn't quite get it.

If I indeed missed something in the paper, sorry for that.

tim-learn commented 3 years ago

As I reproduced the results for PDA on Office-Home, I also tried to unlock the gradient for the source classifier. Interestingly, there doesn't seems to be much acc difference after letting the classifier being able to update its weights. The acc difference is within 0.5% and sometimes close to zero.

Therefore, I am assuming locking the classifier is mainly because the problem setup demands this and not because there is a theory states doing so could be more accurate? I did read the paper's 3.2 which mentions locking gradients for ht, but maybe I didn't quite get it.

If I indeed missed something in the paper, sorry for that.

Sorry for the late reply. In fact, we also have tried that solution last year, it did not always work. It behaves like a classifier adjustment without any feature representation learning in the target domain. You can try other datasets or other UDA setting by yourself. Besides, freezing the classifier head and learning the feature extractor sounds more reasonable to learn domain-invariant features for the same classifier in UDA.

TimandXiyu commented 3 years ago

As I reproduced the results for PDA on Office-Home, I also tried to unlock the gradient for the source classifier. Interestingly, there doesn't seems to be much acc difference after letting the classifier being able to update its weights. The acc difference is within 0.5% and sometimes close to zero. Therefore, I am assuming locking the classifier is mainly because the problem setup demands this and not because there is a theory states doing so could be more accurate? I did read the paper's 3.2 which mentions locking gradients for ht, but maybe I didn't quite get it. If I indeed missed something in the paper, sorry for that.

Sorry for the late reply. In fact, we also have tried that solution last year, it did not always work. It behaves like a classifier adjustment without any feature representation learning in the target domain. You can try other datasets or other UDA setting by yourself. Besides, freezing the classifier head and learning the feature extractor sounds more reasonable to learn domain-invariant features for the same classifier in UDA.

Hmm, ok, thanks for your reply on that, and it is indeed true that freezing it is more suitable for UDA tasks.