Closed joshbarua closed 1 year ago
Hi, I would also be interested in this code / a concrete explanation how the CLEV score is generated.
@joshbarua Did you get any more info here or figured out a way to compute it?
Thanks a lot!
Chantal
Hi Chantal, to compute the CLEV score, we do the following:
(1) run the chexbert labeler on each generated and ground-truth NLE (2) get the subset of the chexbert-predicted labels which are "evidence labels" according to our graph (3) check if they are the same for both the generated and ground-truth NLE (4) if yes, it's correct, if no, it's incorrect (5) the CLEV score is then the accuracy for this over the entire test set
Let me know if anything is unclear
Hi,
Thanks a lot for your reply, that is really helpful! :)
So am I correct in assuming, that no explanation keywords as mentioned in Table 1 e.g. are used during labeling for the CLEV score?
Best,
Chantal
yes, that's correct :)
While implementing this, I came across another follow-up question.
In the evidence graph there is a "Other" node which is used as evidence for multiple labels. What does Other refer to and how should this be taken into account for calculating the CLEV score?
Ah sorry, I didn't specify this earlier. "Other" refers to when no "known" evidence (i.e. none of the MIMIC labels) was found. We excluded all the predicted<->GT NLE pairs where no known evidence was found in the GT NLE when computing the metric.
Hello there,
With the code currently available in the repo, there is no way to generate a list of diagnosis labels and evidence labels for a new report. While the 'query' folder contains these labels, they all come directly from reports in the MIMIC-CXR dataset. If you could provide the tool or code you used to extract these two sets of labels based on the evidence graph from your paper, that would be incredibly helpful. The new metric you introduced, CLEV, cannot be used to evaluate generated text reports without this. Thank you in advance.