lines 272
elif method == 'relaxed_accuracy':
ret['gt'] = answers
ret['pred'] = line['prediction'].strip()
ret['match'] = [relaxed_correctness(ret['pred'], x) for x in ret['gt']]
but function relaxed_correctness is
relaxed_correctness(target: str,
prediction: str,
max_relative_change: float = 0.05
lines 171
# https://github.com/google-research/pix2struct/blob/main/pix2struct/metrics.py#
def relaxed_correctness(target: str,
prediction: str,
max_relative_change: float = 0.05) -> bool:
"""Calculates relaxed correctness.
The correctness tolerates certain error ratio defined by max_relative_change.
See https://arxiv.org/pdf/2203.10244.pdf, end of section 5.1:
“Following Methani et al. (2020), we use a relaxed accuracy measure for the
numeric answers to allow a minor inaccuracy that may result from the automatic
data extraction process. We consider an answer to be correct if it is within
5% of the gold answer. For non-numeric answers, we still need an exact match
to consider an answer to be correct.”
Args:
target: Target string.
prediction: Predicted string.
max_relative_change: Maximum relative change.
Returns:
Whether the prediction was correct given the specified tolerance.
"""
def _to_float(text: str) -> Optional[float]:
try:
if text.endswith('%'):
# Convert percentages to floats.
return float(text.rstrip('%')) / 100.0
else:
return float(text)
except ValueError:
return None
prediction = str(prediction)
target = str(target)
https://github.com/open-compass/VLMEvalKit/blob/main/vlmeval/dataset/image_vqa.py line 57
https://github.com/open-compass/VLMEvalKit/blob/main/vlmeval/dataset/utils/vqa_eval.py
but function relaxed_correctness is relaxed_correctness(target: str, prediction: str, max_relative_change: float = 0.05
lines 171