Open KhyatiNinad opened 3 months ago
To build a multimodal chat using Streamlit and LlamaIndex workflows where users can upload images or provide text questions, you can follow these steps. Below is a sample code that demonstrates how to set up such a system.
import streamlit as st
from llama_index.multi_modal_llms.openai import OpenAIMultiModal
from llama_index.core.multi_modal_llms.generic_utils import load_image_urls
from llama_index.core.llms.types import ChatMessage, MessageRole
from llama_index.schema import ImageDocument
import os
import asyncio
# Set your OpenAI API key
OPENAI_API_KEY = "sk-" # Your OpenAI API token here
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
# Initialize the OpenAIMultiModal model
model = OpenAIMultiModal(model="gpt-4v", api_key=OPENAI_API_KEY, max_new_tokens=300)
def handle_text_input(text):
messages = [ChatMessage(content=text, role=MessageRole.USER)]
response = model.chat(messages)
return response.message.content
def handle_image_input(image):
image_doc = ImageDocument(image=image.read())
response = model.complete(prompt="", image_documents=[image_doc])
return response.text
st.title("Multimodal Chat with Streamlit and LlamaIndex")
# Text input
text_input = st.chat_input("Enter your question:")
if text_input:
with st.chat_message("user"):
st.write(text_input)
response = handle_text_input(text_input)
with st.chat_message("assistant"):
st.write(response)
# Image upload
uploaded_image = st.file_uploader("Upload an image", type=["png", "jpg", "jpeg"])
if uploaded_image:
response = handle_image_input(uploaded_image)
st.image(uploaded_image)
st.write("Response:", response)
# Asynchronous operations for better performance
async def main():
await asyncio.gather(
handle_text_input(text_input),
handle_image_input(uploaded_image)
)
if __name__ == "__main__":
asyncio.run(main())
OpenAIMultiModal
model is initialized with the required parameters.handle_text_input
function processes text input and returns the model's response.handle_image_input
function processes image input and returns the model's response.st.chat_input
and st.chat_message
functions are used to stream user queries and model responses, enhancing the user experience.main
function uses asyncio.gather
to handle text and image inputs asynchronously, improving performance and responsiveness.This setup provides a basic framework for a multimodal chat application using Streamlit and LlamaIndex. Adjust the model parameters and error handling as needed for your specific use case. For more advanced functionalities, such as streaming responses, refer to the detailed examples provided in the LlamaIndex documentation [1][2][3][4][5].
To continue talking to Dosu, mention @dosu.
@dosu But how to integrate workflow here?
To integrate the workflow into the multimodal chat application using Streamlit and LlamaIndex, you can follow the provided code structure. Here is a step-by-step guide based on the given code:
Initialize the Streamlit Application:
Load Data Using LlamaIndex:
WikipediaReader
to load data from a specified Wikipedia page.VectorStoreIndex
from the loaded documents.Create the Chat Engine:
VectorStoreIndex
.Display Chat Interface:
streamlit_pills
.Here is the complete code to integrate the workflow:
import asyncio
from typing import Any, Dict
from llama_index.core import (
Settings,
VectorStoreIndex,
)
from llama_index.llms.openai import OpenAI
from llama_index.readers.wikipedia import WikipediaReader
# Create a new event loop
loop = asyncio.new_event_loop()
# Set the event loop as the current event loop
asyncio.set_event_loop(loop)
class StreamlitChatPack:
"""Streamlit chatbot pack."""
def __init__(
self,
wikipedia_page: str = "Snowflake Inc.",
run_from_main: bool = False,
**kwargs: Any,
) -> None:
"""Init params."""
if not run_from_main:
raise ValueError(
"Please run this llama-pack directly with "
"`streamlit run [download_dir]/streamlit_chatbot/base.py`"
)
self.wikipedia_page = wikipedia_page
def get_modules(self) -> Dict[str, Any]:
"""Get modules."""
return {}
def run(self, *args: Any, **kwargs: Any) -> Any:
"""Run the pipeline."""
import streamlit as st
from streamlit_pills import pills
st.set_page_config(
page_title=f"Chat with {self.wikipedia_page}'s Wikipedia page, powered by LlamaIndex",
page_icon="🦙",
layout="centered",
initial_sidebar_state="auto",
menu_items=None,
)
if "messages" not in st.session_state: # Initialize the chat messages history
st.session_state["messages"] = [
{"role": "assistant", "content": "Ask me a question about Snowflake!"}
]
st.title(
f"Chat with {self.wikipedia_page}'s Wikipedia page, powered by LlamaIndex 💬🦙"
)
st.info(
"This example is powered by the **[Llama Hub Wikipedia Loader](https://llamahub.ai/l/wikipedia)**. Use any of [Llama Hub's many loaders](https://llamahub.ai/) to retrieve and chat with your data via a Streamlit app.",
icon="ℹ️",
)
def add_to_message_history(role, content):
message = {"role": role, "content": str(content)}
st.session_state["messages"].append(
message
) # Add response to message history
@st.cache_resource
def load_index_data():
loader = WikipediaReader()
docs = loader.load_data(pages=[self.wikipedia_page])
Settings.llm = OpenAI(model="gpt-3.5-turbo", temperature=0.5)
return VectorStoreIndex.from_documents(docs)
index = load_index_data()
selected = pills(
"Choose a question to get started or write your own below.",
[
"What is Snowflake?",
"What company did Snowflake announce they would acquire in October 2023?",
"What company did Snowflake acquire in March 2022?",
"When did Snowflake IPO?",
],
clearable=True,
index=None,
)
if "chat_engine" not in st.session_state: # Initialize the query engine
st.session_state["chat_engine"] = index.as_chat_engine(
chat_mode="context", verbose=True
)
for message in st.session_state["messages"]: # Display the prior chat messages
with st.chat_message(message["role"]):
st.write(message["content"])
# To avoid duplicated display of answered pill questions each rerun
if selected and selected not in st.session_state.get(
"displayed_pill_questions", set()
):
st.session_state.setdefault("displayed_pill_questions", set()).add(selected)
with st.chat_message("user"):
st.write(selected)
with st.chat_message("assistant"):
response = st.session_state["chat_engine"].stream_chat(selected)
response_str = ""
response_container = st.empty()
for token in response.response_gen:
response_str += token
response_container.write(response_str)
add_to_message_history("user", selected)
add_to_message_history("assistant", response)
if prompt := st.chat_input(
"Your question"
): # Prompt for user input and save to chat history
add_to_message_history("user", prompt)
# Display the new question immediately after it is entered
with st.chat_message("user"):
st.write(prompt)
# If last message is not from assistant, generate a new response
# if st.session_state["messages"][-1]["role"] != "assistant":
with st.chat_message("assistant"):
response = st.session_state["chat_engine"].stream_chat(prompt)
response_str = ""
response_container = st.empty()
for token in response.response_gen:
response_str += token
response_container.write(response_str)
# st.write(response.response)
add_to_message_history("assistant", response.response)
# Save the state of the generator
st.session_state["response_gen"] = response.response_gen
if __name__ == "__main__":
StreamlitChatPack(run_from_main=True).run()
This code sets up a Streamlit application that uses LlamaIndex to create a chat interface based on data from a Wikipedia page. The chat engine is powered by OpenAI's GPT-3.5-turbo model. For more examples and detailed guidance, you can refer to the LlamaIndex documentation [1][2][3].
Additionally, you can download the necessary LlamaIndex pack using the llamaindex-cli
and run the app directly with the following commands:
pip install llama-index
llamaindex-cli download-llamapack StreamlitChatPack --download-dir ./streamlit_chatbot_pack
export OPENAI_API_KEY="sk-..."
streamlit run ./streamlit_chatbot_pack/base.py
To continue talking to Dosu, mention @dosu.
@dosu No. I mean, how to use the llamaindex workflows that are newly added
To use the newly added LlamaIndex workflows in your multimodal chat application, follow these steps:
Install Required Libraries: Ensure you have the necessary libraries installed:
%pip install mlflow>=2.15 llama-index>=0.10.44 -q
Set Up Environment Variables: Set up your OpenAI API key:
import os
from getpass import getpass
os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")
Create a VectorStoreIndex: Create an index using LlamaIndex:
from llama_index.core import Document, VectorStoreIndex
from llama_index.core.llms import ChatMessage
llama_index_example_document = Document.example()
index = VectorStoreIndex.from_documents([llama_index_example_document])
Perform Inference: Use the index to perform different types of inference:
# Query Engine
query_response = index.as_query_engine().query("What is llama_index?")
print(query_response)
# Chat Engine
chat_response = index.as_chat_engine().chat(
"What is llama_index?",
chat_history=[
ChatMessage(role="system", content="You are an expert on RAG!")
],
)
print(chat_response)
# Retriever
retriever_response = index.as_retriever().retrieve("What is llama_index?")
print(retriever_response)
Log the Index with MLflow: Log the index to MLflow for tracking and versioning:
import mlflow
mlflow.llama_index.autolog() # Enable tracing
with mlflow.start_run() as run:
mlflow.llama_index.log_model(
index,
artifact_path="llama_index",
engine_type="query", # Defines the pyfunc and spark_udf inference type
input_example="hi", # Infers signature
registered_model_name="my_llama_index_vector_store", # Stores an instance in the model registry
)
run_id = run.info.run_id
model_uri = f"runs:/{run_id}/llama_index"
print(f"Unique identifier for the model location for loading: {model_uri}")
Load the Index and Perform Inference: Load the model and perform inference:
# Inference via Llama Index
index = mlflow.llama_index.load_model(model_uri)
query_response = index.as_query_engine().query("hi")
print(query_response)
# Inference via MLflow PyFunc
index = mlflow.pyfunc.load_model(model_uri)
query_response = index.predict("hi")
print(query_response)
By following these steps, you can integrate LlamaIndex workflows into your multimodal chat application, leveraging the capabilities of LlamaIndex and MLflow for robust AI application development [1].
To continue talking to Dosu, mention @dosu.
@dosu where is the workflow here?
The workflow for integrating LlamaIndex with a multimodal chat application is described in the following code snippets:
Toolkit Configuration:
index_configs = [
IndexToolConfig(
query_engine=query_engine,
name="blah",
description="blah blah",
index_query_kwargs={},
tool_kwargs={"return_direct": True, "return_sources": True},
),
]
toolkit = LlamaToolkit(
index_configs=index_configs,
)
Agent Setup:
prefix_message = "only ever return blahs"
memory = ConversationBufferMemory(memory_key="chat_history")
llm = ChatOpenAI(
streaming=True,
temperature=0,
verbose=True,
)
agent_chain = create_llama_chat_agent(
toolkit,
llm,
memory=memory,
verbose=True,
agent_kwargs={"prefix": prefix_message},
return_sources=True
)
Running the Agent:
res = await agent_chain.acall("say something cool")
This workflow sets up the necessary configurations and runs the agent to handle chat interactions, returning a JSON string that includes the sources [1].
To continue talking to Dosu, mention @dosu.
Question Validation
Question
I want to build a multimodal chat using streamlit and llamaindex workflows, wherein user can upload images or provide text questions. Can you provide the best practice and sample codes.