mbzuai-oryx / XrayGPT

[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
469 stars 56 forks source link

Questions about the paper #16

Open ResonWang opened 1 year ago

ResonWang commented 1 year ago

Dear authors, I have some questions about the paper content: (1) what is the MedVicuna and RadVicuna in Table 1? I cannot find them in the paper or on the Internet; (2) According to Figure 1, it seems only the Linear Transformation Layer is trained in the whole framework, but why you mentioned in the contributions that "The LLM (Vicuna) is fine-turned on medical data"? (3) In your work, if only the Linear Transformation Layer is trained while the LLM and MedClip are all frozen?

OmkarThawakar commented 1 year ago

Dear @ResonWang

Thanks for interest in our work.

Our MedVicuna and RadVicuna is fine-tuned version of vicuna on medical and radiologist conversation data provided. We fine-tuned llm (vicuna) on medical and radiology conversation data separately prior to the training of linear layer. We kept Llm and Image Encoder frozen while training linear layer.

awaisahmednuces commented 1 year ago

Hi Dear Authors,

So whether you trained LLM or Image encoder in this training stage-1 or stage-2?

OmkarThawakar commented 1 year ago

Dear @awaisahmednuces, We train llm separately on medical and radiology conversation. In our method, in stage-1 and stage-2 training, we train projection layer between image encoder and llm on MIMIC & OpenI data.