No Code for Generating Evidence and Diagnosis Labels on New Reports

maximek3 / MIMIC-NLE

19 stars 3 forks source link

No Code for Generating Evidence and Diagnosis Labels on New Reports #2

Closed joshbarua closed 1 year ago

joshbarua commented 1 year ago

Hello there,

With the code currently available in the repo, there is no way to generate a list of diagnosis labels and evidence labels for a new report. While the 'query' folder contains these labels, they all come directly from reports in the MIMIC-CXR dataset. If you could provide the tool or code you used to extract these two sets of labels based on the evidence graph from your paper, that would be incredibly helpful. The new metric you introduced, CLEV, cannot be used to evaluate generated text reports without this. Thank you in advance.

ChantalMP commented 1 year ago

Hi, I would also be interested in this code / a concrete explanation how the CLEV score is generated.

@joshbarua Did you get any more info here or figured out a way to compute it?

Thanks a lot!

Chantal

maximek3 commented 1 year ago

Hi Chantal, to compute the CLEV score, we do the following:

(1) run the chexbert labeler on each generated and ground-truth NLE (2) get the subset of the chexbert-predicted labels which are "evidence labels" according to our graph (3) check if they are the same for both the generated and ground-truth NLE (4) if yes, it's correct, if no, it's incorrect (5) the CLEV score is then the accuracy for this over the entire test set

Let me know if anything is unclear

ChantalMP commented 1 year ago

Hi,

Thanks a lot for your reply, that is really helpful! :)

So am I correct in assuming, that no explanation keywords as mentioned in Table 1 e.g. are used during labeling for the CLEV score?

Best,

Chantal

maximek3 commented 1 year ago

yes, that's correct :)

ChantalMP commented 1 year ago

While implementing this, I came across another follow-up question.

In the evidence graph there is a "Other" node which is used as evidence for multiple labels. What does Other refer to and how should this be taken into account for calculating the CLEV score?

maximek3 commented 11 months ago

Ah sorry, I didn't specify this earlier. "Other" refers to when no "known" evidence (i.e. none of the MIMIC labels) was found. We excluded all the predicted<->GT NLE pairs where no known evidence was found in the GT NLE when computing the metric.