Closed cocoza4 closed 10 months ago
🤖
Hello @cocoza4! 🙌 I'm Dosu, an AI bot designed to lend a hand with bugs, answer queries, and help you contribute to this project. Think of me as your friendly guide while we wait for a human maintainer. Let's dive into this issue, shall we?
It seems like the issue you're encountering is related to how the SQLAutoVectorQueryEngine is handling table selection when processing queries. As per the LlamaIndex documentation, the SQLAutoVectorQueryEngine does not directly handle table selection. Instead, it relies on the SQL query engine provided to it during initialization.
In your case, it appears that the SQL query engine is not correctly identifying the table to use when processing the query "Tell me about the movie with the highest views". This could be due to a variety of reasons, such as the SQL query engine not being properly configured or the underlying SQL database not containing the expected tables or data.
To troubleshoot this issue, I would recommend the following steps:
Verify that the SQL query engine is correctly configured and can successfully execute queries against your SQL database outside of the Langchain integration. This can help determine if the issue is with the SQL query engine itself or with the integration.
Check the structure and data of your SQL database to ensure it contains the expected tables and data. The SQL query engine relies on this information to correctly process queries.
Review the Langchain integration code to ensure it is correctly initializing the SQLAutoVectorQueryEngine with the SQL query engine. The SQLAutoVectorQueryEngine requires a SQL query engine that is an instance of either BaseSQLTableQueryEngine or NLSQLTableQueryEngine.
If you continue to experience issues after trying these steps, please provide more information about your SQL query engine configuration and the structure and data of your SQL database. This will help me better understand the issue and provide more targeted assistance.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot after investigating, I still experienced the same issue, here's my SQL query engine code,
from sqlalchemy import (
create_engine,
)
engine = create_engine("sqlite:///trueid.db", future=True)
from llama_index.indices.struct_store.sql_query import NLSQLTableQueryEngine
tables = ["item"]
sql_database = SQLDatabase(engine, include_tables=tables)
sql_query_engine = NLSQLTableQueryEngine(
sql_database=sql_database,
tables=tables,
)
vector_store_info = VectorStoreInfo(
content_info="metadata about different movies/series",
metadata_info=[
MetadataInfo(
name="title", type="str", description="The name of the item"
),
MetadataInfo(
name="release_year", type="str", description="Release year"
),
MetadataInfo(
name="type", type="str", description="Type of the item"
),
MetadataInfo(
name="other_type", type="str", description="Other type of the item"
),
MetadataInfo(
name="vod_main_category", type="str", description="Main category"
),
],
)
vector_auto_retriever = VectorIndexAutoRetriever(
vector_index, vector_store_info=vector_store_info
)
retriever_query_engine = RetrieverQueryEngine.from_args(
vector_auto_retriever, service_context=service_context
)
sql_tool = QueryEngineTool.from_defaults(
query_engine=sql_query_engine,
description=(
"Useful for translating a natural language query into a SQL query over"
" tables containing: item, containing movies/series and its metadata"
),
)
vector_tool = QueryEngineTool.from_defaults(
query_engine=retriever_query_engine,
description=(
"Useful for answering semantic questions about different movies/series"
),
)
query_engine = SQLAutoVectorQueryEngine(
sql_tool, vector_tool, service_context=service_context
)
🤖
Based on the information you've provided, it seems like the issue might be related to the selector logic in the SQLAutoVectorQueryEngine. This selector decides whether to query the SQL database or another query engine. If the selector is not choosing the SQL database when it should, it could lead to the issue you're experiencing.
Here are a few steps you can take to debug this issue:
Check the selector logic: The selector is an instance of either LLMSingleSelector or PydanticSingleSelector. It's used in the _query
method of the SQLJoinQueryEngine class. The select
method of the selector is called with the metadata of the SQL and other query engines, and the query bundle. If the index returned by the select
method is 0, the SQL database is queried; if the index is 1, the other query engine is queried. You can add logging or print statements in the _query
method to check if the selector is working correctly.
Check the SQL query: If the selector is working correctly, the issue might be with the SQL query. You can add logging or print statements in the _query_sql_other
method to check if the SQL query is correctly formed and if it's returning the expected results.
Check the transformation of the query: The _query_sql_other
method transforms the SQL query into a new query using the SQL response. You can add logging or print statements in this method to check if the transformation is correct.
Check the other query engine: If the SQL query and its transformation are correct, the issue might be with the other query engine. You can add logging or print statements in the _query_sql_other
method to check if the other query engine is returning the expected results.
Check the synthesis of the responses: The _query_sql_other
method synthesizes the responses from the SQL database and the other query engine into a final response. You can add logging or print statements in this method to check if the synthesis is correct.
You can find more information about the SQLJoinQueryEngine and its methods in the LlamaIndex repository.
Please try these debugging steps and let me know if you need further assistance.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
it turns out that ChatGPT translates human language into SQL query and passes it to Langchain Tool, as shown in
Entering new AgentExecutor chain...
Thought: Do I need to use a tool? Yes
Action: SQLAutoVectorQueryEngine
Action Input: {"query": "SELECT * FROM movies ORDER BY views DESC LIMIT 1"}
The simple fix is as follows
from pydantic.v1 import BaseModel, Field
# Define the input schema
class QueryInput(BaseModel):
q: str = Field(..., description="Only accept Human language query")
def query_tool(q: str):
try:
return str(query_engine.query(q))
except Exception as ex:
print("Exception thrown:", ex)
tools = [
Tool(
name="SQLAutoVectorQueryEngine",
func=query_tool,
description=f"useful for when you want to answer queries about movies/series. Input is the human language.", # <- specify clearly human language is expected
args_schema=QueryInput, # <- add schema
),
]
Now the input to Langchain tool would be
Entering new AgentExecutor chain...
Thought: Do I need to use a tool? Yes
Action: SQLAutoVectorQueryEngine
Action Input: Tell me about the movie with the highest views
Question Validation
Question
Given a
query_engine
,query_engine.query("Tell me about the series with the highest views")
gives the correct resultHowever, when I integrated with Langchain. Ref: https://github.com/run-llama/llama_index/blob/main/examples/langchain_demo/LangchainDemo.ipynb
The response didn't pick the right tables