lorenzkuhn / semantic_uncertainty

MIT License
142 stars 20 forks source link

Question on the computation of P(True) baseline #6

Open seonwoo-min opened 1 year ago

seonwoo-min commented 1 year ago

Hi @lorenzkuhn,

I wanted to bring to your attention a potential error in the computation of the P(True) baseline, unless I have misunderstood something here.

Currently, in the code snippet, only the first len(tokenized_base_prompt) targets are to -100: https://github.com/lorenzkuhn/semantic_uncertainty/blob/27adbf0dc1bf056c771c205d89c2a79cbd82dc3a/code/get_prompting_based_uncertainty.py#L108-L113

However, it seems that this approach does not ignore the entire context tokens when calculating the NLL loss, as the prompt_true also includes the few_shot_prompt prior to the base_prompt: https://github.com/lorenzkuhn/semantic_uncertainty/blob/27adbf0dc1bf056c771c205d89c2a79cbd82dc3a/code/get_prompting_based_uncertainty.py#L105-L106

It seems not ignoring the entire context tokens for the NLL loss computation might lead to inaccurate results. Could you please provide some insights or clarification on this matter?

In addition, the current codes only use the n_samples_to_use = 2000 samples for P(True) baseline. Do the experiment settings for the P(True) and the others different? I don't recall reading any explanation in the paper.