yueatsprograms / uda_release

Unsupervised Domain Adaptation through Self-Supervision
The Unlicense
79 stars 17 forks source link

CIFAR10->STL: bug in the number of classes + the results obtained with the provided scripts don't match the paper Table #6

Open EvgeniaAR opened 3 years ago

EvgeniaAR commented 3 years ago

First, setting the number of classes to 9 here is not triggered because the "target" set in the bash script says "stl10" and not "stl". Therefore, the number of classes in the model is still 10. This likely results in worse performance and is not intended.

Further, I ran the provided scripts aiming to reproduce the numbers in the paper Table for CIFAR10->STL. However, they do not match:

I ran the scripts here as given.

Output from show_table.py cifar_stl source only accuracy: 68.28 output/cifar_stl_r/loss.pth best accuracy: 74.67 output/cifar_stl_r/loss.pth mmd select accuracy: 69.96 output/cifar_stl_rq/loss.pth best accuracy: 75.96 output/cifar_stl_rq/loss.pth mmd select accuracy: 72.17 output/cifar_stl_rqf/loss.pth best accuracy: 76.81 output/cifar_stl_rqf/loss.pth mmd select accuracy: 73.97

Expected performance from the paper Table: R: 81.2 RLF: 82.1

This is a very large mismatch. We would like to use your model+method in our ICLR submission. However, we can only do so if we can reproduce the numbers from your paper :(. Am I doing something wrong? :(

yueatsprograms commented 3 years ago

Sorry for the problems. Have you tried using stl instead of stl10 (as you discovered) for target?

Yu

EvgeniaAR commented 3 years ago

thanks for the fast response :).

using "stl" instead of "stl10" doesn't work because the dataloader wants the argument to have "stl10" ;). I just changed the line where it wants to set the number of classes to 9 to saying "stl10" and then it worked.

Could you please comment on the mismatching values if running the code?

yueatsprograms commented 3 years ago

My question was meant to ask, what are the results of after you have fixed the target argument? Or if those are, then what are the originals?

EvgeniaAR commented 3 years ago

oh sorry! the results I got were with the original code, i.e. with 10 classes. I tried fixing the bug and using 9 classes, but did not get better results this way :(.

yueatsprograms commented 3 years ago

I dug up a plot from more than two years ago when I did this paper. Does your plot look similar? The red and magenta lines are rotation only and quadrant only. Sorry I never spent enough time on the codebase since the paper was never published.

cifar_stl_plot.pdf

EvgeniaAR commented 3 years ago

thanks for the effort :). I think you are using other hyperparameters here. When one just runs the code, one gets the following curves: cifar-stl-r.pdf cifar-stl-rqf.pdf

It seems that in my case, the learning rate decay doesn't have the same effect as in your case. It's a bit weird, and I definitely did not disable the scheduler :(.

You don't happen to know which hyperparameters the curve was obtained with that you posted above?

yueatsprograms commented 3 years ago

It seems like the only difference is the width hyper-parameter, which is 8 in the script that I think was ran at the time. This was likely because I didn't have enough time to run it, but maybe 8 is the magic number...

EvgeniaAR commented 3 years ago

I ran the script with width=8 and both 9 and 10 classes and here are the results:

datasets num_classes error
cifar-stl-r 9 32.1%
cifar-stl-r 10 30.3%
cifar-stl-rqf 9 28.0%
cifar-stl-rqf 10 27.4%

The results are for choosing the epoch with the MMD+source error heuristic. I think the results with width=8 are actually worse than with width=16 :(.