Closed gingerwizard closed 5 months ago
π€
Based on the information you've provided, it seems like the issue is related to the rate limit of OpenAI's API, which is being hit due to multiple requests being issued by your Streamlit app. This is likely due to the SQLAutoVectorQueryEngine
querying the OpenAI API multiple times.
To resolve this issue, you could implement a caching mechanism to store the results of previous queries. This way, if the same query is made again, the result can be fetched from the cache instead of making a new request to the OpenAI API. This would significantly reduce the number of requests made to the API and help avoid hitting the rate limit.
Here's a simple example of how you could implement caching using Python's built-in functools
library:
import functools
@functools.lru_cache(maxsize=128)
def cached_query(query):
return query_engine.query(query)
In this example, functools.lru_cache
is a decorator that adds caching to the cached_query
function. The maxsize
parameter determines the maximum number of recent queries that will be stored in the cache. If a query is made that is already in the cache, the cached result will be returned instead of making a new request to the OpenAI API.
You would then use cached_query
instead of query_engine.query
in your Streamlit app:
response = cached_query(prompt)
This should help reduce the number of requests made to the OpenAI API and avoid hitting the rate limit.
As for the SQLAutoVectorQueryEngine
, it is designed to handle queries to both a SQL database and a vector database. When an API request is received, it first decides whether it needs to query the SQL database or the vector store. If it decides to query the SQL database, it will also decide whether to augment information with retrieved results from the vector store. The SQLAutoVectorQueryEngine
uses the VectorIndexAutoRetriever
to retrieve results from the vector store.
You can find more information about the SQLAutoVectorQueryEngine
in the LlamaIndex documentation and the source code.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
@gingerwizard is this maybe streamlit refreshing the page/code, causing the code to rerun? I know streamlit has some weird behaviour sometimes when things change (and maybe explains why you cant reproduce outside of streamlit)
I would suggest simplifying your code a bit, and making the most simple version that reproduces the issue, and then we can solve it from there
Defn related to streamlit. Does llama make any guarantees as to thread safety for certain objects? e.g. indexes or query engines or should we recreate these for every request?
@gingerwizard I would be weary of thread safety. But maybe you can cache the result of get_engine()
? Or use a global ?
Hi, @gingerwizard,
I'm helping the LlamaIndex team manage their backlog and am marking this issue as stale. From what I understand, the LlamaIndex Streamlit app was issuing multiple unnecessary requests, causing errors with 429s from the OpenAI API. The issue couldn't be reproduced outside of Streamlit. It looks like the suggestions from dosubot and logan-markewich led to implementing a caching mechanism to store the results of previous queries, reducing the number of requests made to the OpenAI API. Additionally, the code was simplified to reproduce the issue and then solved from there. There was also a discussion about thread safety guarantees for certain objects in the llama library, and caching the result of get_engine()
or using a global was suggested.
Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days.
Question Validation
Question
I have a simple streamlit app below. This generally works and uses the
SQLAutoVectorQueryEngine
to merge a SQL database of stackoverflow answers and a vector index of hacker news posts.The issue is when i run the query it issues every request upto 8 times. This causes openai to error with 429s.
I can't reproduce this outside of Streamlit.