Open orriduck opened 5 months ago
Hi @orriduck
The two numbers are the selfcheck scores. As there are two sentences to be assessed, there are two numbers (e.g., [0.334014 0.975106] for the NLI example). The first number is the score of the first sentence, and the second number is the score of the second sentence.
For each score, a higher number means a higher chance of being non-factual (i.e., hallucination). The scores are bounded between 0.0 and 1.0 for BERTScore, QA, NLI, and LLM-prompting variants.
Hi @potsawee,
Appreciate for the help, may I ask a follow up about the sampled_passage?
More specifically I wonder
Hi @orriduck
Yes, the motivation for selfcheck is that if one asks an LLM multiple times about the same thing (i.e., using the same prompt to the LLM) -- one can obtain $S_0, S_1, S_2, ..., S_N$ responses when asking $N+1$ times.
In this scenario, we can use the sampled passages ($S_1, S_2,..., S_N$) as the evidence to (self)-check $S_0$. If most of the sampled passages disagree with $S_0$, it may indicate a high chance of being a hallucination.
So yes, the sampled passaged are required in computing the selfcheck score.
Hi all, it's be a dumb question, I just wanted to know what does the two numbers mean in the example result? is that related to the length of sampled_passage? If the result is always going to be 2 numbers?