Accuracy reported on the validation set of visda-2017 instead of test set atleast in the code. Are the results reported in BSP paper also on the validation set using the same code output?

thuml / Batch-Spectral-Penalization

Code release for Transferability vs. Discriminability: Batch Spectral Penalization for Adversarial Domain Adaptation (ICML 2019)

91 stars 17 forks source link

The paper reports SOTA results compared to CDAN on VisDA-2017 dataset. However there might be some issues with the reproducibility.

On a close look, the provided code only reports the accuracy on the validation set and not onn the test set. Also leads to a doubt if the results reported in the paper are on the same validation set or on the actual test. If indeed the result reported is on the test set. Then it might be that the current code is not the most updated.

Any clarifications regarding this might be very helpful. Also running the current code on VisDA-2017 is reproducing an average accuracy of 77.75% which is better than the reported accuracy of 75.0% in the actual paper. Please refer to the screenshot below for my result.

Screenshot_20200521-111902~01

Any thoughts on this also might be helpful along with my original query.

Best regards, SB

The paper reports SOTA results compared to CDAN on VisDA-2017 dataset. However there might be some issues with the reproducibility.

On a close look, the provided code only reports the accuracy on the validation set and not onn the test set. Also leads to a doubt if the results reported in the paper are on the same validation set or on the actual test. If indeed the result reported is on the test set. Then it might be that the current code is not the most updated.

Any clarifications regarding this might be very helpful. Also running the current code on VisDA-2017 is reproducing an average accuracy of 77.75% which is better than the reported accuracy of 75.0% in the actual paper. Please refer to the screenshot below for my result.

Any thoughts on this also might be helpful along with my original query.

Best regards, sobalgi

The validation set of Visda-2017 is commonly used as data of the target domain, so I don't think any evaluation on validation set will raise issues. Nonetheless, some papers of powerful methods still report performances on the test set. Another question is that, do you think the accuracy converges too fast?

thuml / Batch-Spectral-Penalization

Accuracy reported on the validation set of visda-2017 instead of test set atleast in the code. Are the results reported in BSP paper also on the validation set using the same code output? #3