[Question]: multimodal workflows

Question Validation

[X] I have searched both the documentation and discord for an answer.

Question

I want to build a multimodal chat using streamlit and llamaindex workflows, wherein user can upload images or provide text questions. Can you provide the best practice and sample codes.

To build a multimodal chat using Streamlit and LlamaIndex workflows where users can upload images or provide text questions, you can follow these steps. Below is a sample code that demonstrates how to set up such a system.

Best Practices

Modular Code: Separate the logic for handling text and image inputs.
Error Handling: Ensure robust error handling for file uploads and API calls.
User Interface: Make the UI intuitive for users to upload images and enter text.
Asynchronous Operations: Use asynchronous operations for handling API calls to keep the UI responsive.
Streaming Responses: Stream responses from the LLM to improve user experience.

Sample Code

import streamlit as st
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls
from llama_index.core.llms.types import ChatMessage, MessageRole
from llama_index.schema import ImageDocument
import os
import asyncio

# Set your OpenAI API key
OPENAI_API_KEY = "sk-"  # Your OpenAI API token here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

# Initialize the OpenAIMultiModal model
model = OpenAIMultiModal(model="gpt-4v", api_key=OPENAI_API_KEY, max_new_tokens=300)

def handle_text_input(text):
    messages = [ChatMessage(content=text, role=MessageRole.USER)]
    response = model.chat(messages)
    return response.message.content

def handle_image_input(image):
    image_doc = ImageDocument(image=image.read())
    response = model.complete(prompt="", image_documents=[image_doc])
    return response.text

st.title("Multimodal Chat with Streamlit and LlamaIndex")

# Text input
text_input = st.chat_input("Enter your question:")
if text_input:
    with st.chat_message("user"):
        st.write(text_input)
    response = handle_text_input(text_input)
    with st.chat_message("assistant"):
        st.write(response)

# Image upload
uploaded_image = st.file_uploader("Upload an image", type=["png", "jpg", "jpeg"])
if uploaded_image:
    response = handle_image_input(uploaded_image)
    st.image(uploaded_image)
    st.write("Response:", response)

# Asynchronous operations for better performance
async def main():
    await asyncio.gather(
        handle_text_input(text_input),
        handle_image_input(uploaded_image)
    )

if __name__ == "__main__":
    asyncio.run(main())

Explanation

Model Initialization: The OpenAIMultiModal model is initialized with the required parameters.
Text Input Handling: The handle_text_input function processes text input and returns the model's response.
Image Input Handling: The handle_image_input function processes image input and returns the model's response.
Streamlit UI: The Streamlit interface allows users to input text and upload images, displaying the model's responses accordingly.
Streaming Responses: The st.chat_input and st.chat_message functions are used to stream user queries and model responses, enhancing the user experience.
Asynchronous Operations: The main function uses asyncio.gather to handle text and image inputs asynchronously, improving performance and responsiveness.

This setup provides a basic framework for a multimodal chat application using Streamlit and LlamaIndex. Adjust the model parameters and error handling as needed for your specific use case. For more advanced functionalities, such as streaming responses, refer to the detailed examples provided in the LlamaIndex documentation ^[1]^[2]^[3]^[4]^[5].

To continue talking to Dosu, mention @dosu.

@dosu But how to integrate workflow here?

To integrate the workflow into the multimodal chat application using Streamlit and LlamaIndex, you can follow the provided code structure. Here is a step-by-step guide based on the given code:

Initialize the Streamlit Application:
- Set up the Streamlit page configuration.
- Initialize the chat messages history in the session state.
Load Data Using LlamaIndex:
- Use the WikipediaReader to load data from a specified Wikipedia page.
- Configure the LLM settings using OpenAI's GPT-3.5-turbo model.
- Create a VectorStoreIndex from the loaded documents.
Create the Chat Engine:
- Initialize the chat engine using the VectorStoreIndex.
Display Chat Interface:
- Display prior chat messages.
- Provide a selection of predefined questions using streamlit_pills.
- Handle user input and generate responses using the chat engine.

Here is the complete code to integrate the workflow:

import asyncio
from typing import Any, Dict

from llama_index.core import (
    Settings,
    VectorStoreIndex,
)
from llama_index.llms.openai import OpenAI
from llama_index.readers.wikipedia import WikipediaReader

# Create a new event loop
loop = asyncio.new_event_loop()

# Set the event loop as the current event loop
asyncio.set_event_loop(loop)

class StreamlitChatPack:
    """Streamlit chatbot pack."""

    def __init__(
        self,
        wikipedia_page: str = "Snowflake Inc.",
        run_from_main: bool = False,
        **kwargs: Any,
    ) -> None:
        """Init params."""
        if not run_from_main:
            raise ValueError(
                "Please run this llama-pack directly with "
                "`streamlit run [download_dir]/streamlit_chatbot/base.py`"
            )

        self.wikipedia_page = wikipedia_page

    def get_modules(self) -> Dict[str, Any]:
        """Get modules."""
        return {}

    def run(self, *args: Any, **kwargs: Any) -> Any:
        """Run the pipeline."""
        import streamlit as st
        from streamlit_pills import pills

        st.set_page_config(
            page_title=f"Chat with {self.wikipedia_page}'s Wikipedia page, powered by LlamaIndex",
            page_icon="🦙",
            layout="centered",
            initial_sidebar_state="auto",
            menu_items=None,
        )

        if "messages" not in st.session_state:  # Initialize the chat messages history
            st.session_state["messages"] = [
                {"role": "assistant", "content": "Ask me a question about Snowflake!"}
            ]

        st.title(
            f"Chat with {self.wikipedia_page}'s Wikipedia page, powered by LlamaIndex 💬🦙"
        )
        st.info(
            "This example is powered by the **[Llama Hub Wikipedia Loader](https://llamahub.ai/l/wikipedia)**. Use any of [Llama Hub's many loaders](https://llamahub.ai/) to retrieve and chat with your data via a Streamlit app.",
            icon="ℹ️",
        )

        def add_to_message_history(role, content):
            message = {"role": role, "content": str(content)}
            st.session_state["messages"].append(
                message
            )  # Add response to message history

        @st.cache_resource
        def load_index_data():
            loader = WikipediaReader()
            docs = loader.load_data(pages=[self.wikipedia_page])
            Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.5)

            return VectorStoreIndex.from_documents(docs)

        index = load_index_data()

        selected = pills(
            "Choose a question to get started or write your own below.",
            [
                "What is Snowflake?",
                "What company did Snowflake announce they would acquire in October 2023?",
                "What company did Snowflake acquire in March 2022?",
                "When did Snowflake IPO?",
            ],
            clearable=True,
            index=None,
        )

        if "chat_engine" not in st.session_state:  # Initialize the query engine
            st.session_state["chat_engine"] = index.as_chat_engine(
                chat_mode="context", verbose=True
            )

        for message in st.session_state["messages"]:  # Display the prior chat messages
            with st.chat_message(message["role"]):
                st.write(message["content"])

        # To avoid duplicated display of answered pill questions each rerun
        if selected and selected not in st.session_state.get(
            "displayed_pill_questions", set()
        ):
            st.session_state.setdefault("displayed_pill_questions", set()).add(selected)
            with st.chat_message("user"):
                st.write(selected)
            with st.chat_message("assistant"):
                response = st.session_state["chat_engine"].stream_chat(selected)
                response_str = ""
                response_container = st.empty()
                for token in response.response_gen:
                    response_str += token
                    response_container.write(response_str)
                add_to_message_history("user", selected)
                add_to_message_history("assistant", response)

        if prompt := st.chat_input(
            "Your question"
        ):  # Prompt for user input and save to chat history
            add_to_message_history("user", prompt)

            # Display the new question immediately after it is entered
            with st.chat_message("user"):
                st.write(prompt)

            # If last message is not from assistant, generate a new response
            # if st.session_state["messages"][-1]["role"] != "assistant":
            with st.chat_message("assistant"):
                response = st.session_state["chat_engine"].stream_chat(prompt)
                response_str = ""
                response_container = st.empty()
                for token in response.response_gen:
                    response_str += token
                    response_container.write(response_str)
                # st.write(response.response)
                add_to_message_history("assistant", response.response)

            # Save the state of the generator
            st.session_state["response_gen"] = response.response_gen

if __name__ == "__main__":
    StreamlitChatPack(run_from_main=True).run()

This code sets up a Streamlit application that uses LlamaIndex to create a chat interface based on data from a Wikipedia page. The chat engine is powered by OpenAI's GPT-3.5-turbo model. For more examples and detailed guidance, you can refer to the LlamaIndex documentation ^[1]^[2]^[3].

Additionally, you can download the necessary LlamaIndex pack using the llamaindex-cli and run the app directly with the following commands:

pip install llama-index
llamaindex-cli download-llamapack StreamlitChatPack --download-dir ./streamlit_chatbot_pack
export OPENAI_API_KEY="sk-..."
streamlit run ./streamlit_chatbot_pack/base.py

To continue talking to Dosu, mention @dosu.

@dosu No. I mean, how to use the llamaindex workflows that are newly added

To use the newly added LlamaIndex workflows in your multimodal chat application, follow these steps:

Install Required Libraries: Ensure you have the necessary libraries installed:
```
%pip install mlflow>=2.15 llama-index>=0.10.44 -q
```

Set Up Environment Variables: Set up your OpenAI API key:

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")

Create a VectorStoreIndex: Create an index using LlamaIndex:

from llama_index.core import Document, VectorStoreIndex
from llama_index.core.llms import ChatMessage

llama_index_example_document = Document.example()
index = VectorStoreIndex.from_documents([llama_index_example_document])

Perform Inference: Use the index to perform different types of inference:

# Query Engine
query_response = index.as_query_engine().query("What is llama_index?")
print(query_response)

# Chat Engine
chat_response = index.as_chat_engine().chat(
   "What is llama_index?",
   chat_history=[
       ChatMessage(role="system", content="You are an expert on RAG!")
   ],
)
print(chat_response)

# Retriever
retriever_response = index.as_retriever().retrieve("What is llama_index?")
print(retriever_response)

Log the Index with MLflow: Log the index to MLflow for tracking and versioning:

import mlflow

mlflow.llama_index.autolog()  # Enable tracing

with mlflow.start_run() as run:
   mlflow.llama_index.log_model(
       index,
       artifact_path="llama_index",
       engine_type="query",  # Defines the pyfunc and spark_udf inference type
       input_example="hi",  # Infers signature
       registered_model_name="my_llama_index_vector_store",  # Stores an instance in the model registry
   )

   run_id = run.info.run_id
   model_uri = f"runs:/{run_id}/llama_index"
   print(f"Unique identifier for the model location for loading: {model_uri}")

Load the Index and Perform Inference: Load the model and perform inference:

# Inference via Llama Index
index = mlflow.llama_index.load_model(model_uri)
query_response = index.as_query_engine().query("hi")
print(query_response)

# Inference via MLflow PyFunc
index = mlflow.pyfunc.load_model(model_uri)
query_response = index.predict("hi")
print(query_response)

By following these steps, you can integrate LlamaIndex workflows into your multimodal chat application, leveraging the capabilities of LlamaIndex and MLflow for robust AI application development ^[1].

To continue talking to Dosu, mention @dosu.

@dosu where is the workflow here?

The workflow for integrating LlamaIndex with a multimodal chat application is described in the following code snippets:

Toolkit Configuration:

index_configs = [
    IndexToolConfig(
        query_engine=query_engine,
        name="blah",
        description="blah blah",
        index_query_kwargs={},
        tool_kwargs={"return_direct": True, "return_sources": True},
    ),
]
toolkit = LlamaToolkit(
    index_configs=index_configs,
)

Agent Setup:

prefix_message = "only ever return blahs"

memory = ConversationBufferMemory(memory_key="chat_history")
llm = ChatOpenAI(
    streaming=True,
    temperature=0,
    verbose=True,
)
agent_chain = create_llama_chat_agent(
    toolkit,
    llm,
    memory=memory,
    verbose=True,
    agent_kwargs={"prefix": prefix_message},
    return_sources=True
)

Running the Agent:

res = await agent_chain.acall("say something cool")

This workflow sets up the necessary configurations and runs the agent to handle chat interactions, returning a JSON string that includes the sources ^[1].

To continue talking to Dosu, mention @dosu.

run-llama / llama_index