potsawee / selfcheckgpt

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
MIT License
442 stars 54 forks source link

What are these 2 numbers means in the example result #29

Open orriduck opened 5 months ago

orriduck commented 5 months ago

Hi all, it's be a dumb question, I just wanted to know what does the two numbers mean in the example result? is that related to the length of sampled_passage? If the result is always going to be 2 numbers?

potsawee commented 5 months ago

Hi @orriduck

The two numbers are the selfcheck scores. As there are two sentences to be assessed, there are two numbers (e.g., [0.334014 0.975106] for the NLI example). The first number is the score of the first sentence, and the second number is the score of the second sentence.

For each score, a higher number means a higher chance of being non-factual (i.e., hallucination). The scores are bounded between 0.0 and 1.0 for BERTScore, QA, NLI, and LLM-prompting variants.

orriduck commented 5 months ago

Hi @potsawee,

Appreciate for the help, may I ask a follow up about the sampled_passage?

More specifically I wonder

potsawee commented 5 months ago

Hi @orriduck

Yes, the motivation for selfcheck is that if one asks an LLM multiple times about the same thing (i.e., using the same prompt to the LLM) -- one can obtain $S_0, S_1, S_2, ..., S_N$ responses when asking $N+1$ times.

In this scenario, we can use the sampled passages ($S_1, S_2,..., S_N$) as the evidence to (self)-check $S_0$. If most of the sampled passages disagree with $S_0$, it may indicate a high chance of being a hallucination.

So yes, the sampled passaged are required in computing the selfcheck score.