potsawee / selfcheckgpt

SelfCheckGPT: Zero-Resource Black-Box Hallucination Detection for Generative Large Language Models
MIT License
442 stars 54 forks source link

Question about random baseline #28

Closed WWWonderer closed 5 months ago

WWWonderer commented 5 months ago

Hi Potsawee,

Thank you for the nice open source directory and clear code! I have a question about the random baseline that you used for the AUC measure: in the notebook, you simply computed the average of the gold labels and took it as the random baseline. However, I'm not understanding the reasoning behind it. What exactly do you mean by a random baseline? The AUC value that we get by randomly guessing a value from a uniform distribution and comparing it to gold label? What is the meaning of it exactly?

Thank you and have a nice day!

potsawee commented 5 months ago

Hi @WWWonderer ,

Sorry for my late response. Below is the explanation:

What we mean by “random baseline” is that the baseline randomly predicts the positive class (e.g., non-factual) with probability $p$ (and negative class with prob $1-p$) for any sentence. By doing this, the expected precision = the proportion of the positive examples, while recall = $p$. Thus, when sweeping $p$ from from $0.0$ to $1.0$, it gives a horizontal line in Figure 5 and therefore AUC-PR = the proportion of the positive examples.

More explanation of the random baseline can be found in this StackOverflow article: https://stats.stackexchange.com/a/266989.

Hope this explanation clarifies your question

WWWonderer commented 5 months ago

Hi Potsawee,

Thank you for your reply! It clears things up a lot.

Have a good day!