Finetuning with context/history

I am currently fine-tuning LLaVA on medical images and reports. In medical reports, there are frequent references to previous images. For instance, a study may involve multiple reports, where the first report describes the first image and subsequent reports refer back to previous images. To accommodate this, I have developed dynamic prompts that take into account the image references.

However, while experimenting with the LLaVA online demo, I noticed that the model's internal history or memory is reset every time a new image is provided. Consequently, the model loses the ability to reference the previous image when answering a new prompt. This single image one inference turn procedure may not be ideal for my task.

I would like to know the best approach to address this challenge:

How can I pass multiple images to the model effectively? Could you provide a detailed explanation of the process? Is there a way to prevent the model's internal history from resetting after uploading an image? This would enable the model to make references to the previous image when responding to subsequent prompt questions.

I hope this captures your query accurately. Please let me know if you need further assistance!

haotian-liu / LLaVA

Finetuning with context/history #234