microsoft / RadFact

A metric suite leveraging the logical inference capabilities of LLMs, for radiology report generation both with and without grounding
MIT License
48 stars 6 forks source link

Radfact silently assumes that a comparison is not entailed if LLM request fails #5

Open javier-alvarez opened 1 month ago

javier-alvarez commented 1 month ago

`Run the processor on a single phrase.

    If LLM fails to respond, we return a default NOT_ENTAILMENT with no evidence.
    If LLM tries to rephrase the input, we log a warning and correct it.

`

Engine should retry with exponential retries and stop if a request cannot be completed. This will ensure all sentences are evaluated instead of assuming NOT_ENTAILMENT