Closed aliencaocao closed 2 years ago
I doubt confidence score would be a trustable metric for this setting. Detecting unanswerable questions is a non-trivial task itself and I doubt CTRLsum is able to do that well off-the-shelf. Actually in the QA literature I think people needed to use annotated unanswerable examples to explicitly train models to identify unanswerable questions, e.g. SQuAD v2
I see, thanks for the info.
❓ Questions and Help
Hi, is there something like a confidence score or a metric that I can use to evaluate how confident the model is in generating the summary? Specifically, I used it to perform question-answering, and some questions may not have an answer in the given text. Is there a way to kind of 'detect' when there is no obvious enough answer in the source text? E.g. when the model makes a less-confident prediciton. Thanks.