A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation"
Hi @rubaha96, these are vocab ids for True and False (in LLAMA tokenizer). So these scores represent how likely the model will complete the sequence with True or False.
factscore/factscorer.py lines 219-220, in _get_score:
true_score = logits[5852] false_score = logits[7700]
Is there any intuition about these particular logit indices and how logits are formed?