Scoring existing generated text

shmsw25 / FActScore

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"

MIT License

293 stars 43 forks source link

Hi, thanks a lot for sharing your work.

I am wondering if FActScore can be used for scoring any existing generated text. For example, for a QA task where the model generates an answer along with some reasoning (CoT like), would it be possible to give a 'factuality score' to the generation? Note that I am talking about the case where we have an existing KB that can be used to fact-check (e.g. Wikipedia), but we don't know exactly which article(s) would be relevant for it (unlike in your example where you generate biographies and therefore you already know which is the relevant entity to search for in the KB).

I am looking for a way to get a 'factuality score' for a piece of text, which I can then use as a feature for a separate ML task. Would be grateful for any pointer/suggestions.

Sof

edit: clarified the question

Hi @sofi444, thanks for your interest in our work. This should technically be possible if the retrieval is good enough to find articles which are highly relevant to the underlying fact.

Our package currently uses a two stage retrieval process, the first is article selection using SQL (ref) followed by BM25 ranking of the passages (ref). You could potentially combine it into one (very expensive with BM25Okapi), or use fuzzier matching in stage 1 to find relevant articles.

Note that fact checking for harder facts is an iterative process, and an ideal system probably needs to hop over multiple articles to reliably fact-check (our package is quite far from supporting that).

Hope this helps, closing the issue for now but feel free to re-open if you have more questions! :)

shmsw25 / FActScore

Scoring existing generated text #26