A second file can be uploaded, but Llava and Salmonn keep referring to the previously uploaded file. I believe this is because they get sent the messageHistory, and this is prioritised over the file.
To replicate:
Upload an image or audio file.
Ask "What's in this [image/audio file]?" and receive description of file 1.
Upload a second image or audio file.
Ask "What's in this [image/audio file]?" and receive description of file 1 again (or even "In the audio file, there is a person speaking about the [contents of image 1]" if you switched between an image and an audio file).
A simple fix would be to (warn the user and) remove all previous message history when a new file is uploaded.
A second file can be uploaded, but Llava and Salmonn keep referring to the previously uploaded file. I believe this is because they get sent the messageHistory, and this is prioritised over the file.
To replicate:
A simple fix would be to (warn the user and) remove all previous message history when a new file is uploaded.