haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
19.33k stars 2.13k forks source link

Finetuning with context/history #234

Open cyril-mino opened 1 year ago

cyril-mino commented 1 year ago

I am currently fine-tuning LLaVA on medical images and reports. In medical reports, there are frequent references to previous images. For instance, a study may involve multiple reports, where the first report describes the first image and subsequent reports refer back to previous images. To accommodate this, I have developed dynamic prompts that take into account the image references.

However, while experimenting with the LLaVA online demo, I noticed that the model's internal history or memory is reset every time a new image is provided. Consequently, the model loses the ability to reference the previous image when answering a new prompt. This single image one inference turn procedure may not be ideal for my task.

I would like to know the best approach to address this challenge:

How can I pass multiple images to the model effectively? Could you provide a detailed explanation of the process? Is there a way to prevent the model's internal history from resetting after uploading an image? This would enable the model to make references to the previous image when responding to subsequent prompt questions.

I hope this captures your query accurately. Please let me know if you need further assistance!

anushavishwanathan commented 9 months ago

Any updates on this?

I am looking for a similar requirement of being able to reference prior images for object Re-Identification task. Any help would be much appreciated. Thanks.