wjhou / Recap

Code for the paper "RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning" (EMNLP'23 Findings).
https://wjhou.github.io/Recap/
Apache License 2.0
23 stars 1 forks source link

Some questions in paper #1

Closed yjch00 closed 8 months ago

yjch00 commented 8 months ago

Hi there, thanks for sharing your code!

I have a couple of questions while reviewing your paper:

(1)In Table 2, the performance of ORGAN and RECAP for BLUE-2 seems different from Table 1. Is this intentional, or could there be a typo?

(2)In Table 2, the citation refers to Bannur et al. (2023), and they mentioned using the same split as CXR-RePaiR-Sel. However, when I checked the code for CXR-RePaiR-Sel, it seems to have a different split (no finding section, only impression). Can you clarify if you're using the same split as CXR-RePaiR-Sel?

(3)In Table 2, is Chexbert F1 score weighted-F1 score? (Bannur et al. (2023) says they use weighted-F1 score)

Thanks!

wjhou commented 8 months ago

Hi,

Thanks for your questions. Here are answers to these questions:

A1: I just checked the paper, and yes it is a typo. I will try to correct them in the later version. Thank you for pointing out.

A2: The results you mentioned are cited from "Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing".

A3: As mentioned in the paper "The average of the weighted-F1 score across 14 pathological observations labelled by CheXbert." We used the macro-weighted F1 score labeled by CheXbert. If I understand correctly, we use the same evaluation method. Please also see the response from the author of the "R2Gen" paper (https://github.com/zhjohnchan/R2GenCMN/issues/12). Similarly, in the ORGan paper, we use the same evaluation method (https://github.com/wjhou/ORGan/issues/4).

Feel free to ask if you have other questions.

Best, Ethan

yjch00 commented 8 months ago

Thanks for your answering.

Maybe, table 1 and table 2 have different split. because table 2 do not follow the "Generating radiology reports via memory-driven transformer." ( In "Retrieval-Based Chest X-Ray Report Generation Using a Pre-trained Contrastive Language-Image Model", they said 3,678 evaluated MIMIC-CXR test studies)

(table 1 from "Generating radiology reports via memory-driven transformer." and table 2 from "Learning to Exploit Temporal Structure for Biomedical Vision–Language Processing", "Retrieval-Based Chest X-Ray Report Generation Using a Pre-trained Contrastive Language-Image Model")

wjhou commented 8 months ago

Yes, you are correct.

We cite the results from different papers. However, as they did not use the same experimental settings, so we listed them in separate tables. This is also the reason why there is a column stating "Sections" that tries to clarify the results may not be comparable directly.

Since there are many research papers and some of them may use different experimental settings (e.g., discarding lateral view images or using both Findings and Impression as the target report), we have tried our best to formulate the results.

Reopen the issue if you have other questions.

Best, Ethan