shmsw25 / FActScore

A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
https://arxiv.org/abs/2305.14251
MIT License
238 stars 32 forks source link

Moving abstain detection to a separate module. #20

Closed martiansideofthemoon closed 1 year ago

martiansideofthemoon commented 1 year ago

Fixes #19

Before:

(dipper-venv) kalpesh@arkham:FActScore$ python factscore/factscorer.py --input_path data/labeled/ChatGPT.jsonl --n_samples 30 
[nltk_data] Downloading package punkt to /home/kalpesh/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
06/25/2023 12:22:21 - root - Estimated OpenAI API cost for atomic fact generation ($0.020 per 1000 tokens): $0.00 for 0 words and 0 tokens
06/25/2023 12:22:25 - root - Estimated OpenAI API cost for factscore evaluation ($0.002 per 1000 tokens): $0.00 for 0 words and 0 tokens
06/25/2023 12:22:25 - root - FActScore = 41.6%
06/25/2023 12:22:25 - root - Respond ratio = 100.0%
06/25/2023 12:22:25 - root - # Atomic facts per valid response = 19.8
(dipper-venv) kalpesh@arkham:FActScore$

After:

(dipper-venv) kalpesh@arkham:FActScore$ python factscore/factscorer.py --input_path data/labeled/ChatGPT.jsonl --n_samples 30 --abstain_detection generic
[nltk_data] Downloading package punkt to /home/kalpesh/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
06/25/2023 12:21:50 - root - Estimated OpenAI API cost for atomic fact generation ($0.020 per 1000 tokens): $0.00 for 0 words and 0 tokens
06/25/2023 12:21:53 - root - Estimated OpenAI API cost for factscore evaluation ($0.002 per 1000 tokens): $0.00 for 0 words and 0 tokens
06/25/2023 12:21:53 - root - FActScore = 38.6%
06/25/2023 12:21:53 - root - Respond ratio = 56.7%
06/25/2023 12:21:53 - root - # Atomic facts per valid response = 29.9
(dipper-venv) kalpesh@arkham:FActScore$