run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.96k stars 5.3k forks source link

[Question]: Multiple unncesssary requests made #10768

Closed gingerwizard closed 5 months ago

gingerwizard commented 9 months ago

Question Validation

Question

I have a simple streamlit app below. This generally works and uses the SQLAutoVectorQueryEngine to merge a SQL database of stackoverflow answers and a vector index of hacker news posts.

The issue is when i run the query it issues every request upto 8 times. This causes openai to error with 429s.

INFO:llama_index.core.query_engine.sql_join_query_engine:> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
> Querying SQL database: This choice is most relevant because it involves translating a natural language query into a SQL query over a table containing surveys. This could potentially allow us to find out the top 3 web technologies people most want to use.
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"

I can't reproduce this outside of Streamlit.

import logging
import sys

import streamlit as st
from clickhouse_connect import common
from llama_index.legacy.embeddings import FastEmbedEmbedding
from llama_index.llms.openai import OpenAI

from llama_index.core import VectorStoreIndex, PromptTemplate
from llama_index.core import Settings
from llama_index.core.indices.struct_store import NLSQLTableQueryEngine
from llama_index.core.indices.vector_store import VectorIndexAutoRetriever
from llama_index.core.prompts import PromptType
from llama_index.core.query_engine import RetrieverQueryEngine, SQLAutoVectorQueryEngine
from llama_index.core.tools import QueryEngineTool
from llama_index.core.vector_stores.types import VectorStoreInfo, MetadataInfo
from llama_index.vector_stores.clickhouse import ClickHouseVectorStore
import clickhouse_connect
import openai
from sqlalchemy import (
    create_engine,
)
from llama_index.core import SQLDatabase

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

host = "localhost"
password = ""
username = "default"
secure = True
http_port = 8123
native_port = 9000
open_ai_model = "gpt-4"

hackernews_table = "hackernews_llama"
stackoverflow_table = "surveys"
database = "default"

Settings.embed_model = FastEmbedEmbedding(
            model_name="sentence-transformers/all-MiniLM-L6-v2",
            max_length=384,
        )

CLICKHOUSE_TEXT_TO_SQL_TMPL = (
    "Given an input question, first create a syntactically correct ClickHouse SQL "
    "query to run, then look at the results of the query and return the answer. "
    "You can order the results by a relevant column to return the most "
    "interesting examples in the database.\n\n"
    "Never query for all the columns from a specific table, only ask for a "
    "few relevant columns given the question.\n\n"
    "Pay attention to use only the column names that you can see in the schema "
    "description. "
    "Be careful to not query for columns that do not exist. "
    "Pay attention to which column is in which table. "
    "Also, qualify column names with the table name when needed. \n"
    "If needing to group on Array Columns use the ClickHouse function arrayJoin e.g. arrayJoin(columnName) \n"
    "For example, the following query identifies the most popular database:\n"
    "SELECT database_want_to_work_with, count(*) AS count FROM so_surveys GROUP BY "
    "arrayJoin(database_want_to_work_with) AS database_want_to_work_with ORDER BY count DESC LIMIT 1\n"
    "You are required to use the following format, each taking one line:\n\n"
    "Question: Question here\n"
    "SQLQuery: SQL Query to run\n"
    "SQLResult: Result of the SQLQuery\n"
    "Answer: Final answer here\n\n"
    "Only use tables listed below.\n"
    "{schema}\n\n"
    "Question: {query_str}\n"
    "SQLQuery: "
)

CLICKHOUSE_TEXT_TO_SQL_PROMPT = PromptTemplate(
    CLICKHOUSE_TEXT_TO_SQL_TMPL,
    prompt_type=PromptType.TEXT_TO_SQL,
)

st.set_page_config(
    page_title="Get summaries of Hacker News posts enriched with Stackoverflow survey results, powered by LlamaIndex and CLickHouse",
    page_icon="πŸ¦™πŸš€", layout="centered", initial_sidebar_state="auto", menu_items=None)
st.title("πŸ’¬HackBot powered by LlamaIndex πŸ¦™ and ClickHouse πŸš€")
st.info(
    "Check out the full [blog post](https://blog.streamlit.io/build-a-chatbot-with-custom-data-sources-powered-by-llamaindex/) for this app",
    icon="πŸ“ƒ")
st.caption("A streamlit chatbot for Hacker News powered by πŸ’¬πŸ¦™ and ClickHouse πŸš€")

# What are people saying about ClickHouse?
@st.cache_resource
def clickhouse():
    common.set_setting('autogenerate_session_id', False)
    return clickhouse_connect.get_client(
        host=host, port=http_port, username=username, password=password,
        secure=secure
    )

@st.cache_resource(show_spinner=False)
def sql_auto_vector_query_engine():
    with st.spinner(text="Preparing indexes. This should take a few seconds. No time to make πŸ«–"):
        engine = create_engine(
            f'clickhouse+native://{username}:{password}@{host}:' +
            f'{native_port}/{database}?compression=lz4&secure={secure}'
        )
        sql_database = SQLDatabase(engine, include_tables=[stackoverflow_table], view_support=True)
        vector_store = ClickHouseVectorStore(clickhouse_client=clickhouse(), table=hackernews_table)
        vector_index = VectorStoreIndex.from_vector_store(vector_store)
        return sql_database, vector_index

def get_engine():
    sql_database, vector_index = sql_auto_vector_query_engine()

    nl_sql_engine = NLSQLTableQueryEngine(
        sql_database=sql_database,
        tables=[stackoverflow_table],
        text_to_sql_prompt=CLICKHOUSE_TEXT_TO_SQL_PROMPT,
        llm=OpenAI(model=open_ai_model)
    )
    vector_store_info = VectorStoreInfo(
        content_info="Social news posts and comments from users",
        metadata_info=[
            MetadataInfo(
                name="post_score", type="int", description="Score of the comment or post",
            ),
            MetadataInfo(
                name="by", type="str", description="the author or person who posted the comment",
            ),
            MetadataInfo(
                name="time", type="date", description="the time at which the post or comment was made",
            ),
        ]
    )

    vector_auto_retriever = VectorIndexAutoRetriever(
        vector_index, vector_store_info=vector_store_info, similarity_top_k=10,
        # vector_store_kwargs={"where": f"length >= {max_length} AND post_score >= {score} AND time >= '{min_date}'"}
    )

    retriever_query_engine = RetrieverQueryEngine.from_args(vector_auto_retriever, llm=OpenAI(model=open_ai_model))

    sql_tool = QueryEngineTool.from_defaults(
        query_engine=nl_sql_engine,
        description=(
            "Useful for translating a natural language query into a SQL query over"
            f" a table containing: {stackoverflow_table}, containing the survey responses on"
            f" different types of technology users currently use and want to use"
        ),
    )
    vector_tool = QueryEngineTool.from_defaults(
        query_engine=retriever_query_engine,
        description=(
            f"Useful for answering semantic questions abouts users comments and posts"
        ),
    )

    return SQLAutoVectorQueryEngine(
        sql_tool, vector_tool, llm=OpenAI(model=open_ai_model)
    )

@st.cache_data
def query(prompt):
    return str(st.session_state.engine.query(prompt))

if "max_score" not in st.session_state.keys():
    client = clickhouse()
    st.session_state.max_score = int(
        client.query("SELECT max(post_score) FROM default.hackernews_llama").first_row[0])
    st.session_state.max_length = int(
        client.query("SELECT max(length) FROM default.hackernews_llama").first_row[0])
    st.session_state.min_date, st.session_state.max_date = client.query(
        "SELECT min(toDate(time)), max(toDate(time)) FROM default.hackernews_llama WHERE time != '1970-01-01 00:00:00'").first_row

if "messages" not in st.session_state:
    st.session_state.messages = [
        {"role": "assistant", "content": "Ask me a question about opinions on Hacker News and Stackoverflow!"}]

with st.sidebar:
    score = st.slider('Min Score', 0, st.session_state.max_score, value=0)
    max_length = st.slider('Min comment Length (tokens)', 0, st.session_state.max_length, value=20)
    min_date = st.date_input('Min comment date', value=st.session_state.min_date, min_value=st.session_state.min_date,
                             max_value=st.session_state.max_date)
    openai_api_key = st.text_input("Open API Key", key="chatbot_api_key", type="password")
    openai.api_key = openai_api_key
    "[Get an OpenAI API key](https://platform.openai.com/account/api-keys)"
    "[View the source code](https://github.com/clickhouse/examples/blob/main/Chatbot.py)"
    "[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/ClickHouse/examples?quickstart=1)"

if not openai_api_key:
    st.info("Please add your OpenAI API key to continue.")
    st.stop()

if openai_api_key and "engine" not in st.session_state.keys():
    engine = get_engine()
    st.session_state.engine = get_engine()

if prompt := st.chat_input(placeholder="Your question about Hacker News"):
    st.session_state.messages.append({"role": "user", "content": prompt})

for message in st.session_state.messages:  # Display the prior chat messages
    with st.chat_message(message["role"]):
        st.write(message["content"])

# If last message is not from assistant, generate a new response
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant"):
        with st.spinner("Thinking..."):
            response = query(prompt)
            st.write(response)
            st.session_state.messages.append({"role": "assistant", "content": response})
dosubot[bot] commented 9 months ago

πŸ€–

Based on the information you've provided, it seems like the issue is related to the rate limit of OpenAI's API, which is being hit due to multiple requests being issued by your Streamlit app. This is likely due to the SQLAutoVectorQueryEngine querying the OpenAI API multiple times.

To resolve this issue, you could implement a caching mechanism to store the results of previous queries. This way, if the same query is made again, the result can be fetched from the cache instead of making a new request to the OpenAI API. This would significantly reduce the number of requests made to the API and help avoid hitting the rate limit.

Here's a simple example of how you could implement caching using Python's built-in functools library:

import functools

@functools.lru_cache(maxsize=128)
def cached_query(query):
    return query_engine.query(query)

In this example, functools.lru_cache is a decorator that adds caching to the cached_query function. The maxsize parameter determines the maximum number of recent queries that will be stored in the cache. If a query is made that is already in the cache, the cached result will be returned instead of making a new request to the OpenAI API.

You would then use cached_query instead of query_engine.query in your Streamlit app:

response = cached_query(prompt)

This should help reduce the number of requests made to the OpenAI API and avoid hitting the rate limit.

As for the SQLAutoVectorQueryEngine, it is designed to handle queries to both a SQL database and a vector database. When an API request is received, it first decides whether it needs to query the SQL database or the vector store. If it decides to query the SQL database, it will also decide whether to augment information with retrieved results from the vector store. The SQLAutoVectorQueryEngine uses the VectorIndexAutoRetriever to retrieve results from the vector store.

You can find more information about the SQLAutoVectorQueryEngine in the LlamaIndex documentation and the source code.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

#### About Dosu This response is meant to be useful and save you time. It is **not meant to be a precise solution**, but rather a starting point for your own research. Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant. If you want to continue the conversation, start your reply with @dosu-bot.
logan-markewich commented 9 months ago

@gingerwizard is this maybe streamlit refreshing the page/code, causing the code to rerun? I know streamlit has some weird behaviour sometimes when things change (and maybe explains why you cant reproduce outside of streamlit)

logan-markewich commented 9 months ago

I would suggest simplifying your code a bit, and making the most simple version that reproduces the issue, and then we can solve it from there

gingerwizard commented 9 months ago

Defn related to streamlit. Does llama make any guarantees as to thread safety for certain objects? e.g. indexes or query engines or should we recreate these for every request?

logan-markewich commented 9 months ago

@gingerwizard I would be weary of thread safety. But maybe you can cache the result of get_engine() ? Or use a global ?

dosubot[bot] commented 6 months ago

Hi, @gingerwizard,

I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the LlamaIndex Streamlit app was issuing multiple unnecessary requests, causing errors with 429s from the OpenAI API. The issue couldn't be reproduced outside of Streamlit. It looks like the suggestions from dosubot and logan-markewich led to implementing a caching mechanism to store the results of previous queries, reducing the number of requests made to the OpenAI API. Additionally, the code was simplified to reproduce the issue and then solved from there. There was also a discussion about thread safety guarantees for certain objects in the llama library, and caching the result of get_engine() or using a global was suggested.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.

I