Reproducing Figure 3 of the paper

shgoshtasb commented 3 years ago

Hi,

I'm trying to run the experiment in section 4.3 of the paper using python3 experiments/scaling_binning_calibrator/compare_calibrators.py but it throws the following error

File "experiments/scaling_binning_calibrator/compare_calibrators.py", line 11
    def eval_top_calibration(probs, probs, labels):
    ^
SyntaxError: duplicate argument 'probs' in function definition

Also the functions eval_top_calibration, upper_bound_marginal_calibration_unbiased and upper_bound_marginal_calibration_biased have the same problem.

I think in eval_top_calibration we should pass probs = utils.get_top_probs(probs) to cal.get_discrete_bins in line 13.

But after changing those lines I'm still unable to reproduce the plots in Figure 3 of the paper. Can you please tell me what should I modify to make it work?

AnanyaKumar commented 3 years ago

Sorry for the very late response!

Looks like I introduced some issues in a more recent change I made. I'll get these fixed soon.

In general, for reproducibility, the best source might be https://worksheets.codalab.org/worksheets/0xb6d027ee127e422989ab9115726c5411

This Codalab worksheet contains the exact code to reproduce experiments in the paper. For example, for the experiment you're looking at I believe you want the bundle: https://worksheets.codalab.org/bundles/0xd9d037f3de8a4b31be4072f3b75735b1 and under "dependencies" you can see the exact code that produced the run. You can also download the code and Docker containers.

In any case, will get this fixed asap!

AnanyaKumar commented 3 years ago

This should be fixed! Let me know if you have any issues. For ImageNet logits, please download them at: https://worksheets.codalab.org/bundles/0x81c9c8a9bf6c47f59f45f6fc80790c3c and put them into the data folder (otherwise the ImageNet experiment won't run)

AnanyaKumar commented 3 years ago

Closing for now, but feel free to reopen if you have follow-up questions or it doesn't work!

p-lambda / verified_calibration

Reproducing Figure 3 of the paper #4