Closed TimandXiyu closed 3 years ago
As I reproduced the results for PDA on Office-Home, I also tried to unlock the gradient for the source classifier. Interestingly, there doesn't seems to be much acc difference after letting the classifier being able to update its weights. The acc difference is within 0.5% and sometimes close to zero.
Therefore, I am assuming locking the classifier is mainly because the problem setup demands this and not because there is a theory states doing so could be more accurate? I did read the paper's 3.2 which mentions locking gradients for ht, but maybe I didn't quite get it.
If I indeed missed something in the paper, sorry for that.
Sorry for the late reply. In fact, we also have tried that solution last year, it did not always work. It behaves like a classifier adjustment without any feature representation learning in the target domain. You can try other datasets or other UDA setting by yourself. Besides, freezing the classifier head and learning the feature extractor sounds more reasonable to learn domain-invariant features for the same classifier in UDA.
As I reproduced the results for PDA on Office-Home, I also tried to unlock the gradient for the source classifier. Interestingly, there doesn't seems to be much acc difference after letting the classifier being able to update its weights. The acc difference is within 0.5% and sometimes close to zero. Therefore, I am assuming locking the classifier is mainly because the problem setup demands this and not because there is a theory states doing so could be more accurate? I did read the paper's 3.2 which mentions locking gradients for ht, but maybe I didn't quite get it. If I indeed missed something in the paper, sorry for that.
Sorry for the late reply. In fact, we also have tried that solution last year, it did not always work. It behaves like a classifier adjustment without any feature representation learning in the target domain. You can try other datasets or other UDA setting by yourself. Besides, freezing the classifier head and learning the feature extractor sounds more reasonable to learn domain-invariant features for the same classifier in UDA.
Hmm, ok, thanks for your reply on that, and it is indeed true that freezing it is more suitable for UDA tasks.
As I reproduced the results for PDA on Office-Home, I also tried to unlock the gradient for the source classifier. Interestingly, there doesn't seems to be much acc difference after letting the classifier being able to update its weights. The acc difference is within 0.5% and sometimes close to zero.
Therefore, I am assuming locking the classifier is mainly because the problem setup demands this and not because there is a theory states doing so could be more accurate? I did read the paper's 3.2 which mentions locking gradients for ht, but maybe I didn't quite get it.
If I indeed missed something in the paper, sorry for that.