A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
LLAMA weight differences ---
sum: tensor(0.0013)
average: tensor(1.3684e-13)
as a baseline, differences between LLAMA and Inst-LLAMA:
sum: tensor(7083599.5000)
avg: tensor(0.0005)
5