Could you provide an example of using ngram to predict the factuality of a sentence?

potsawee / selfcheckgpt

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models

MIT License

467 stars 54 forks source link

Hi @potsawee , thanks for sharing your awesome work.

However, when trying to run your code, I found that even though there is an n-gram model, there are no examples of its usage provided. The n-gram model is very different from the others since its score ranges $[0, inf]$ while others range $[0,1]$. I tried using your method of normalizing it with $\frac{x-x{\max}}{x{\max}-x_{\min}}$, but this is probably not precise enough as the $x$ values during one output can all be similarly high or similarly low, thus not a reliable normalization method.

I wonder if my understanding has problems. Therefore, could you please share some better methods of normalizing, point out my problems, and share an example of evaluating the hallucination score using the n-gram method? Thanks.

potsawee / selfcheckgpt

Could you provide an example of using ngram to predict the factuality of a sentence? #16