shmsw25 / FActScore

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
https://arxiv.org/abs/2305.14251
MIT License
275 stars 40 forks source link

Using unlabeled data to generate atomic facts and retrieving evidence #22

Closed rubaha96 closed 1 year ago

rubaha96 commented 1 year ago

Hello! I have a few questions.

1) Do I understand correctly that in the current pipeline, the generation of atomic facts for unlabeled data is done both using the actual unlabeled data and demos data, even if we need only to process the unlabeled data? Why demos data is always used in these calculations?

2) Did you notice that the retrieved evidence is the same for all atomic facts inside a particular bio? Seems that it should be dependent on each particular atomic fact.

shmsw25 commented 1 year ago

Thank you for your question.

  1. The demos data consists of demonstrations (in-context learning) for the Atomic Fact Generator. So it's part of the data the model uses, rather than being used as the test data. Annotations in the demos data are written by humans.
  2. We are validating each atomic fact based on the Wikipedia article about the subject. Thus, retrieval is restricted to this single article. Please refer to the Implementation details in 4.1.2 of the paper. It is possible to relax this restriction but we empirically found this to be working better.