run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.11k stars 4.61k forks source link

[Question]: How to do a text-based RAG using AzureOpenAIMultiModal and gpt-4o #13727

Open mballav opened 1 month ago

mballav commented 1 month ago

Question Validation

Question

I am trying to use gpt-4o as my model for RAG on PDF/txt documents. Could someone please provide an example of how I can do that?

Do I need to use AzureOpenAIMultiModal class or AzureOpenAI?

When I use AzureOpenAI, it complains about the model not supported. Here is my code:

        return AzureOpenAI(
            model="gpt-4o-2024-05-13",
            deployment_name="gpt4o",
            azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
            api_key=os.getenv("AZURE_OPENAI_API_KEY"),
            openai_api_type="azure",
            openai_api_version='2024-05-13',
            temperature=0
        )

And, here is the error message:

ValueError: Unknown model 'gpt-4o-2024-05-13'. Please provide a valid OpenAI model name in: gpt-4, gpt-4-32k, gpt-4-1106-preview, gpt-4-0125-preview, gpt-4-turbo-
preview, gpt-4-vision-preview, gpt-4-1106-vision-preview, gpt-4-turbo-2024-04-09, gpt-4-turbo, gpt-4-0613, gpt-4-32k-0613, gpt-4-0314, gpt-4-32k-0314, gpt-3.5-turbo, gpt-3.5-turbo-16k,
 gpt-3.5-turbo-0125, gpt-3.5-turbo-1106, gpt-3.5-turbo-0613, gpt-3.5-turbo-16k-0613, gpt-3.5-turbo-0301, text-davinci-003, text-davinci-002, gpt-3.5-turbo-instruct, text-ada-001, text-babbage-001, text-curie-001, ada, babbage, curie, davinci, gpt-35-turbo-16k, gpt-35-turbo, gpt-35-turbo-0125, gpt-35-turbo-1106, gpt-35-turbo-0613, gpt-35-turbo-16k-0613
dosubot[bot] commented 1 month ago

To use the gpt-4o model for Retrieval-Augmented Generation (RAG) on PDF/txt documents, you need to use the AzureOpenAIMultiModal class. The AzureOpenAI class does not support the gpt-4o model, which is why you are encountering the error.

Here is an example of how to configure and use the AzureOpenAIMultiModal class for your task:

  1. Set Environment Variables: Ensure the following environment variables are set:

    • OPENAI_API_VERSION: 2023-05-15
    • AZURE_OPENAI_ENDPOINT: Your Azure endpoint URL, e.g., https://YOUR_RESOURCE_NAME.openai.azure.com/
    • AZURE_OPENAI_API_KEY: Your Azure API key
  2. Initialize the AzureOpenAIMultiModal Class: Use the following code to initialize the AzureOpenAIMultiModal class:

    from llama_index.legacy.multi_modal_llms.azure_openai import AzureOpenAIMultiModal
    
    azure_openai_multi_modal = AzureOpenAIMultiModal(
       model="gpt-4o",
       engine="your_deployment_name",  # Replace with your deployment name
       azure_endpoint="https://YOUR_RESOURCE_NAME.openai.azure.com/",  # Replace with your endpoint URL
       api_key="your_api_key",  # Replace with your API key
       api_version="2023-05-15"
    )
  3. Build and Query Multi-Modal Vector Stores: Use the MultiModalVectorStoreIndex and SimpleMultiModalQueryEngine to build and query multi-modal vector stores. Here is an example:

    from llama_index.core.indices import MultiModalVectorStoreIndex
    from llama_index.vector_stores.qdrant import QdrantVectorStore
    from llama_index.core import SimpleDirectoryReader, StorageContext
    import qdrant_client
    
    # Create a local Qdrant vector store
    client = qdrant_client.QdrantClient(path="qdrant_mm_db")
    
    text_store = QdrantVectorStore(client=client, collection_name="text_collection")
    image_store = QdrantVectorStore(client=client, collection_name="image_collection")
    
    storage_context = StorageContext.from_defaults(vector_store=text_store, image_store=image_store)
    
    # Load text and image documents from local folder
    documents = SimpleDirectoryReader("./data_folder/").load_data()
    
    # Create the MultiModal index
    index = MultiModalVectorStoreIndex.from_documents(documents, storage_context=storage_context)
    
    # Use MultiModal Retriever and Query Engine
    from llama_index.core import PromptTemplate
    from llama_index.core.query_engine import SimpleMultiModalQueryEngine
    
    retriever_engine = index.as_retriever(similarity_top_k=3, image_similarity_top_k=3)
    retrieval_results = retriever_engine.retrieve(response)
    
    qa_tmpl_str = (
       "Context information is below.\n"
       "---------------------\n"
       "{context_str}\n"
       "---------------------\n"
       "Given the context information and not prior knowledge, "
       "answer the query.\n"
       "Query: {query_str}\n"
       "Answer: "
    )
    qa_tmpl = PromptTemplate(qa_tmpl_str)
    
    query_engine = index.as_query_engine(multi_modal_llm=azure_openai_multi_modal, text_qa_template=qa_tmpl)
    
    query_str = "Tell me more about the Porsche"
    response = query_engine.query(query_str)

This setup will allow you to use the gpt-4o model for RAG on PDF/txt documents with Azure OpenAI [1][2][3][4].

#### Details **About Dosu** This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

To continue the conversation, mention @dosu.

mballav commented 1 month ago

@dosu If I were to create MultiModalVectorStoreIndex from nodes, how would I do that?

logan-markewich commented 1 month ago

@mballav its supported on the LLM, but you need to update probably

pip install -U llama-index-llms-azure-openai llama-index-llms-openai

mballav commented 1 month ago

Thanks alot, @logan-markewich! You saved my day.