run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
35.6k stars 5.03k forks source link

[Bug]: KeyError: 'context_str' in query_engine.image_query(image_path, query_str) #14730

Open StarVolcano opened 2 months ago

StarVolcano commented 2 months ago

Bug Description

Got a KeyError: 'context_str' when executing query_engine.image_query(image_path, query_str) function. I think the problem is that in the _get_response_with_images() function from the SimpleMultiModalQueryEngine class in llama_index/core/query_engine/multi_modal.py, content_str is not provided.

fmt_prompt = self._image_qa_template.format( query_str=prompt_str, )

Version

0.10.50

Steps to Reproduce

`from llama_index.core.indices.multi_modal.base import ( MultiModalVectorStoreIndex, ) from llama_index.vector_stores.qdrant import QdrantVectorStore from llama_index.core import SimpleDirectoryReader, StorageContext from llama_index.embeddings.clip import ClipEmbedding

import qdrant_client

client = qdrant_client.QdrantClient(path="qdrant_mm_wiki")

text_store = QdrantVectorStore(client=client, collection_name="text_collection") image_store = QdrantVectorStore(client=client, collection_name="image_collection") storage_context = StorageContext.from_defaults(vector_store=text_store, image_store=image_store)

from llama_index.core import Settings Settings.embed_model = ClipEmbedding() image_embed_model = ClipEmbedding()

documents = SimpleDirectoryReader("./mixed_wiki",recursive=True).load_data()

from llama_index.core.node_parser import SentenceSplitter Settings.text_splitter = SentenceSplitter(chunk_size=60, chunk_overlap=5)

index = MultiModalVectorStoreIndex.from_documents( documents, storage_context=storage_context, transformations=[SentenceSplitter(chunk_size=60, chunk_overlap=5)], image_embed_model=image_embed_model, )

from llama_index.multi_modal_llms.ollama import OllamaMultiModal mm_model = OllamaMultiModal(model="llava:13b")

import numpy as np from llama_index.core.prompts import PromptTemplate from llama_index.core.query_engine import SimpleMultiModalQueryEngine

qa_tmpl_str = ( "Given the images provided, " "answer the query.\n" "Query: {query_str}\n" "Answer: " ) qa_tmpl = PromptTemplate(qa_tmpl_str)

query_str = 'What is the main object in the picture?'

query and response

query_engine = index.as_query_engine(llm=mm_model, text_qa_template=qa_tmpl) response = query_engine.image_query('./mixed_wiki/1.jpg', query_str)`

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 2 months ago

Hey there, @StarVolcano! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let me know how I can assist you while you wait for a human maintainer.

To fix the KeyError: 'context_str' in the _get_response_with_images() function, you need to ensure that the context_str key is included in the format string of the _image_qa_template. The current implementation of _get_response_with_images() only includes query_str in the format string, which is likely causing the KeyError.

Here is the corrected version of the _get_response_with_images() function:

def _get_response_with_images(
    self,
    prompt_str: str,
    image_nodes: List[ImageNode],
) -> RESPONSE_TYPE:
    context_str = "\n\n".join([r.get_content() for r in image_nodes])
    fmt_prompt = self._image_qa_template.format(
        context_str=context_str, query_str=prompt_str
    )
    llm_response = self._multi_modal_llm.complete(
        prompt=fmt_prompt,
        image_documents=[image_node.node for image_node in image_nodes],
    )
    return Response(
        response=str(llm_response),
        source_nodes=image_nodes,
        metadata={"image_nodes": image_nodes},
    )

This ensures that context_str is included in the format string, preventing the KeyError. Additionally, make sure that the image_qa_template is correctly set up to accept both context_str and query_str as parameters [1][2].

To continue talking to Dosu, mention @dosu.