Closed jsbaan closed 1 year ago
Hi Joris,
Thank you for your interest in our work!
In early experiments, we used the first line you mention to address numerical issues. This was not used in our final experiments and was accidentally included in our release version. I've updated the released version of the code to reflect that.
The llh_shift
is a heuristic way of accounting for the fact that we're summing up unnormalized likelihoods when computing the semantic entropy. We are summing up unnormalized probabilities when using length-normalized sequence likelihoods. By applying the llh_shift
we ensure that all the semantic log-likelihoods are negative before computing the Monte Carlo estimate of the entropy over them.
I hope this helps! Lorenz
Got it, thank you
Hey, thanks a lot for publishing the code!
I was hoping you could shed some light on the following two lines.
First, I don't immediately see why
torch.max(torch.nan_to_num(torch.logsumexp(mean_across_models, dim=1), nan=-100))
is subtracted frommean_across_models
https://github.com/lorenzkuhn/semantic_uncertainty/blob/2cbd5a5d4ac5386ebe98526b302e5fda3d4f5a65/code/compute_confidence_measure.py#L126Second, what does
llh_shift
represent, and why it is subtracted fromtorch.tensor(aggregated_likelihoods)
? https://github.com/lorenzkuhn/semantic_uncertainty/blob/2cbd5a5d4ac5386ebe98526b302e5fda3d4f5a65/code/compute_confidence_measure.py#L136Let me know if you want me to be more specific.
Thanks again! Joris