izmailovpavel / neurips_bdl_starter_kit

18 stars 4 forks source link

HMC CIFAR10 checkpoint sample performance mismatch #6

Closed timxzz closed 1 year ago

timxzz commented 1 year ago

Hello Pavel!

I was trying to run the HMC checkpoints provided here, but I noticed that if I simply run the colab notebook with the full CIFAR10 test set, the test accuracy is 0.8247, which doesn't match the test accuracy provided from the ckpt_dict["accuracy"] (i.e., 0.9601807) for the default checkpoint chain 0, sample 152.

To reproduce it, you can simply change the notebook ceil from

num_inputs = 100

x_test, y_test = test_set
input, target = x_test[0, :num_inputs], y_test[0, :num_inputs]

to

x_test, y_test = test_set
input, target = x_test[0], y_test[0]

and run all.

Did I do something wrong? How can I reproduce the accuracy from ckpt_dict["accuracy"] for a give checkpoint?

Thank a lot!!

izmailovpavel commented 1 year ago

Hi @timxzz, I am not sure what you are doing exactly. Could you put your entire code on colab or github gist so I could take a look? My understanding is you are looking at individual HMC samples, which will not be high accuracy. Also 96% is way too high for our model on the test set. Could you check the accuracy on the train set? I wonder if that's the accuracy stored in ckpt_dir (sorry, I worked on this project many years ago, and don't remember all details)

timxzz commented 1 year ago

Hi @izmailovpavel, thank you for your reply!! Here is a colab based on your code, you can just run all. Link

I did give it a try on the training set. I think you are right, the accuracy stored at ckpt_dict["accuracy"] is very likely to be the training acc. I guess you might want to update the notebook documentation from

- `"accuracy"` — sample accuracy on the test set

to

- `"accuracy"` — sample accuracy on the train set

Anyway, thanks for your reply, it's really helpful!