shmsw25 / FActScore

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
https://arxiv.org/abs/2305.14251
MIT License
236 stars 32 forks source link

FileNotFoundError (demons.json) for custom knowledge base #28

Open MathiasKraus opened 8 months ago

MathiasKraus commented 8 months ago

Hello,

First off, I'd like to express my appreciation for this great package you've developed. I'm in the process of testing a scenario where I evaluate the quality of generated summaries based on a custom knowledge base. Any guidance or pointers would be greatly appreciated!

For this purpose, I create the following knowledge.jsonl file:

{"title": "Gravity", "text": "Gravity is a force by which a planet or other body draws objects toward its center. The force of gravity keeps all of the planets in orbit around the sun."}
{"title": "Photosynthesis", "text": ["Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll pigments.", "In simple words, it is the process where plants make their own food using sunlight."]}
{"title": "Pythagorean Theorem", "text": "In mathematics, the Pythagorean theorem, also known as Pythagoras's theorem, is a fundamental relation in Euclidean geometry among the three sides of a right triangle. It states that the square of the hypotenuse is equal to the sum of the squares of the other two sides."}

and, following the example in the README, run the code:

fs = FactScorer(openai_key="...")
fs.register_knowledge_source("science_knowledge_base",
                             data_path="/content/knowledge.jsonl",
                             db_path="/content/knowledge_db")
topics = ["Gravity", "Photosynthesis", "Pythagorean Theorem"]
generations = ["Gravity is a force that draws objects toward the center of a planet or body, keeping planets in orbit around the sun.",
               "Photosynthesis allows plants and certain organisms to create food using sunlight and chlorophyll.",
               "This theorem in Euclidean geometry relates the three sides of a right triangle, stating that the hypotenuse's square is the sum of the squares of the other sides."]

out = fs.get_score(topics, generations, knowledge_source="science_knowledge_base")

In the last line however I receive the following error message:

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-47-58ee9f60532e>](https://localhost:8080/#) in <cell line: 2>()
      1 # now, when you compute a score, specify knowledge source to use
----> 2 out = fs.get_score(topics, generations, knowledge_source="science_knowledge_base")
      3 print (out["score"]) # FActScore
      4 print (out["respond_ratio"]) # % of responding (not abstaining from answering)
      5 print (out["num_facts_per_response"]) # average number of atomic facts per response

1 frames
[/usr/local/lib/python3.10/dist-packages/factscore/factscorer.py](https://localhost:8080/#) in get_score(self, topics, generations, gamma, atomic_facts, knowledge_source, verbose)
    127         else:
    128             if self.af_generator is None:
--> 129                 self.af_generator = AtomicFactGenerator(key_path=self.openai_key,
    130                                                         demon_dir=os.path.join(self.data_dir, "demos"),
    131                                                         gpt3_cache_file=os.path.join(self.cache_dir, "InstructGPT.pkl"))

[/usr/local/lib/python3.10/dist-packages/factscore/atomic_facts.py](https://localhost:8080/#) in __init__(self, key_path, demon_dir, gpt3_cache_file)
     27 
     28         # get the demos
---> 29         with open(self.demon_path, 'r') as f:
     30             self.demons = json.load(f)
     31 

FileNotFoundError: [Errno 2] No such file or directory: '.cache/factscore/demos/demons.json'

I'm trying to understand the role of demons.json and necessity. Despite my efforts to comb through the code, I couldn't quite grasp its purpose. Could you shed some light on this?

System: I am running this on colab and installed the factscore package using pip install --upgrade factscore.

Thank you very much in advance!

MathiasKraus commented 8 months ago

I made it work now by putting the demons.json file in the folder. However, I am wondering why I need this for a custom knowledge base. Could you help me understand this?

martiansideofthemoon commented 8 months ago

Hi @MathiasKraus, thanks a lot for your interest in our work!

The demonstrations are needed for atomic fact generation with davinci-003, which is used irrespective of the knowledge base. Did you run the following command in your setup? It downloads all the needed data for you. https://github.com/shmsw25/FActScore#download-the-data

You could skip the --llama_7B_HF_path "llama-7B" flag here if you are only using OpenAI models.