Open ResonWang opened 1 year ago
Dear @ResonWang
Thanks for interest in our work.
Our MedVicuna and RadVicuna is fine-tuned version of vicuna on medical and radiologist conversation data provided. We fine-tuned llm (vicuna) on medical and radiology conversation data separately prior to the training of linear layer. We kept Llm and Image Encoder frozen while training linear layer.
Hi Dear Authors,
So whether you trained LLM or Image encoder in this training stage-1 or stage-2?
Dear @awaisahmednuces, We train llm separately on medical and radiology conversation. In our method, in stage-1 and stage-2 training, we train projection layer between image encoder and llm on MIMIC & OpenI data.
Dear authors, I have some questions about the paper content: (1) what is the MedVicuna and RadVicuna in Table 1? I cannot find them in the paper or on the Internet; (2) According to Figure 1, it seems only the Linear Transformation Layer is trained in the whole framework, but why you mentioned in the contributions that "The LLM (Vicuna) is fine-turned on medical data"? (3) In your work, if only the Linear Transformation Layer is trained while the LLM and MedClip are all frozen?