[Question]: How to do a text-based RAG using AzureOpenAIMultiModal and gpt-4o

mballav commented 6 months ago

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I am trying to use gpt-4o as my model for RAG on PDF/txt documents. Could someone please provide an example of how I can do that?

Do I need to use AzureOpenAIMultiModal class or AzureOpenAI?

When I use AzureOpenAI, it complains about the model not supported. Here is my code:

        return AzureOpenAI(
            model="gpt-4o-2024-05-13",
            deployment_name="gpt4o",
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            api_key=os.getenv("AZURE_OPENAI_API_KEY"),
            openai_api_type="azure",
            openai_api_version='2024-05-13',
            temperature=0
        )

And, here is the error message:

ValueError: Unknown model 'gpt-4o-2024-05-13'. Please provide a valid OpenAI model name in: gpt-4, gpt-4-32k, gpt-4-1106-preview, gpt-4-0125-preview, gpt-4-turbo-
preview, gpt-4-vision-preview, gpt-4-1106-vision-preview, gpt-4-turbo-2024-04-09, gpt-4-turbo, gpt-4-0613, gpt-4-32k-0613, gpt-4-0314, gpt-4-32k-0314, gpt-3.5-turbo, gpt-3.5-turbo-16k,
 gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, text-davinci-003, text-davinci-002, gpt-3.5-turbo-instruct, text-ada-001, text-babbage-001, text-curie-001, ada, babbage, curie, davinci, gpt-35-turbo-16k, gpt-35-turbo, gpt-35-turbo-0125, gpt-35-turbo-1106, gpt-35-turbo-0613, gpt-35-turbo-16k-0613

dosubot[bot] commented 6 months ago

To use the gpt-4o model for Retrieval-Augmented Generation (RAG) on PDF/txt documents, you need to use the AzureOpenAIMultiModal class. The AzureOpenAI class does not support the gpt-4o model, which is why you are encountering the error.

Here is an example of how to configure and use the AzureOpenAIMultiModal class for your task:

Set Environment Variables: Ensure the following environment variables are set:
- OPENAI_API_VERSION: 2023-05-15
- AZURE_OPENAI_ENDPOINT: Your Azure endpoint URL, e.g., https://YOUR_RESOURCE_NAME.openai.azure.com/
- AZURE_OPENAI_API_KEY: Your Azure API key

Initialize the AzureOpenAIMultiModal Class: Use the following code to initialize the AzureOpenAIMultiModal class:

from llama_index.legacy.multi_modal_llms.azure_openai import AzureOpenAIMultiModal

azure_openai_multi_modal = AzureOpenAIMultiModal(
   model="gpt-4o",
   engine="your_deployment_name",  # Replace with your deployment name
   azure_endpoint="https://YOUR_RESOURCE_NAME.openai.azure.com/",  # Replace with your endpoint URL
   api_key="your_api_key",  # Replace with your API key
   api_version="2023-05-15"
)

Build and Query Multi-Modal Vector Stores: Use the MultiModalVectorStoreIndex and SimpleMultiModalQueryEngine to build and query multi-modal vector stores. Here is an example:

from llama_index.core.indices import MultiModalVectorStoreIndex
from llama_index.vector_stores.qdrant import QdrantVectorStore
from llama_index.core import SimpleDirectoryReader, StorageContext
import qdrant_client

# Create a local Qdrant vector store
client = qdrant_client.QdrantClient(path="qdrant_mm_db")

text_store = QdrantVectorStore(client=client, collection_name="text_collection")
image_store = QdrantVectorStore(client=client, collection_name="image_collection")

storage_context = StorageContext.from_defaults(vector_store=text_store, image_store=image_store)

# Load text and image documents from local folder
documents = SimpleDirectoryReader("./data_folder/").load_data()

# Create the MultiModal index
index = MultiModalVectorStoreIndex.from_documents(documents, storage_context=storage_context)

# Use MultiModal Retriever and Query Engine
from llama_index.core import PromptTemplate
from llama_index.core.query_engine import SimpleMultiModalQueryEngine

retriever_engine = index.as_retriever(similarity_top_k=3, image_similarity_top_k=3)
retrieval_results = retriever_engine.retrieve(response)

qa_tmpl_str = (
   "Context information is below.\n"
   "---------------------\n"
   "{context_str}\n"
   "---------------------\n"
   "Given the context information and not prior knowledge, "
   "answer the query.\n"
   "Query: {query_str}\n"
   "Answer: "
)
qa_tmpl = PromptTemplate(qa_tmpl_str)

query_engine = index.as_query_engine(multi_modal_llm=azure_openai_multi_modal, text_qa_template=qa_tmpl)

query_str = "Tell me more about the Porsche"
response = query_engine.query(query_str)

This setup will allow you to use the gpt-4o model for RAG on PDF/txt documents with Azure OpenAI ^[1]^[2]^[3]^[4].

#### Details

**About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

mballav commented 6 months ago

@dosu If I were to create MultiModalVectorStoreIndex from nodes, how would I do that?

logan-markewich commented 6 months ago

@mballav its supported on the LLM, but you need to update probably

pip install -U llama-index-llms-azure-openai llama-index-llms-openai

mballav commented 6 months ago

Thanks alot, @logan-markewich! You saved my day.

szcc commented 1 month ago

Hi @dosu I ran :pip install -U llama-index-llms-azure-openai llama-index-llms-openai and this line: index = MultiModalVectorStoreIndex.from_documents(documents, storage_context=storage_context)

give me AssertionError: llama_index/legacy/vector_stores/qdrant.py:188, in QdrantVectorStore._build_points(self, nodes) 180 sparse_indices, sparse_vectors = self._sparse_doc_fn( 181 [ 182 node.get_content(metadata_mode=MetadataMode.EMBED) 183 for node in node_batch 184 ], 185 ) 187 for i, node in enumerate(node_batch): --> 188 assert isinstance(node, BaseNode) 189 node_ids.append(node.node_id) 191 if self.enable_hybrid:

AssertionError:

run-llama / llama_index

[Question]: How to do a text-based RAG using AzureOpenAIMultiModal and gpt-4o #13727

Question Validation

Question