Clarification question about semantic entropy computation

jsbaan commented 1 year ago

Hey, thanks a lot for publishing the code!

I was hoping you could shed some light on the following two lines.

First, I don't immediately see why torch.max(torch.nan_to_num(torch.logsumexp(mean_across_models, dim=1), nan=-100)) is subtracted from mean_across_models https://github.com/lorenzkuhn/semantic_uncertainty/blob/2cbd5a5d4ac5386ebe98526b302e5fda3d4f5a65/code/compute_confidence_measure.py#L126

Second, what does llh_shift represent, and why it is subtracted from torch.tensor(aggregated_likelihoods)? https://github.com/lorenzkuhn/semantic_uncertainty/blob/2cbd5a5d4ac5386ebe98526b302e5fda3d4f5a65/code/compute_confidence_measure.py#L136

Let me know if you want me to be more specific.

Thanks again! Joris

lorenzkuhn commented 1 year ago

Hi Joris,

Thank you for your interest in our work!

In early experiments, we used the first line you mention to address numerical issues. This was not used in our final experiments and was accidentally included in our release version. I've updated the released version of the code to reflect that.

The llh_shift is a heuristic way of accounting for the fact that we're summing up unnormalized likelihoods when computing the semantic entropy. We are summing up unnormalized probabilities when using length-normalized sequence likelihoods. By applying the llh_shift we ensure that all the semantic log-likelihoods are negative before computing the Monte Carlo estimate of the entropy over them.

I hope this helps! Lorenz

jsbaan commented 1 year ago

Got it, thank you

lorenzkuhn / semantic_uncertainty

Clarification question about semantic entropy computation #1