Closed gich2009 closed 5 days ago
Just noticing that the OpenAIMultiModal and GeminiMultiModal interfaces have .chat() behaviours implemented. AnthropicMultiModal has not yet implemented these but I'm sure they'll be implemented eventually. The issue is that the .chat() methods do not take image_documents as a parameter. Conversely, .complete() methods support image_documents as a parameter but they do not take in messages. How do I maintain a stateful conversation with the multimodal llm where I am passing in image_documents?
Actually would be super to allow passing in a memory object since llama-index offers so many different types of memory classes.
May not be immediately necessary to implement this. Here is a work around for anyone else who is interested:
openai_mm_llm = OpenAIMultiModal( model="gpt-4o-mini", api_key=SECRET_KEY, max_new_tokens=4000, image_detail="auto", temperature=0, timeout=100, )
image_documents = SimpleDirectoryReader("./image").load_data() print(image_documents)
if name == "main": import time
current_mm_llm = openai_mm_llm
start_time = time.perf_counter()
memory = ChatMemoryBuffer.from_defaults()
prompt = "please explain what the image contains."
message = ChatMessage(role="user", content=prompt)
memory.put(message)
prompt = []
for message in memory.get_all():
prompt.append(str(message))
prompt = "\n".join(prompt)
response = current_mm_llm.complete(
prompt=prompt,
image_documents=image_documents,
)
print(response)
end_time = time.perf_counter()
print(f"Time taken {end_time - start_time}")
Feature Description
Basically a SimpleChatEngine equivalent for Multimodal models so they can also take in memory and chat_history parameters.
Reason
Don't know if there is already an abstraction for this but I can't seem to find a good way to make memory play well with multimodals
Value of Feature
For consistency. llama-index generally allows you to pass in chat_messages or chat_history or messages to their abstractions but I can't find a way to do the same for multimodal classes.