Closed sofgun1 closed 4 months ago
Answer generated by a 🤖
The error you're encountering is due to a conflict between the versions of SQLAlchemy and google-cloud-bigquery libraries you're using. The error message TypeError: SQLCompiler.__init__() got multiple values for argument 'cache_key'
indicates that the cache_key
argument is being passed multiple times when initializing the SQLCompiler, which is not allowed. This is likely due to changes in the SQLAlchemy library's API in version 2.0.11 that are not compatible with the version of google-cloud-bigquery you're using.
To resolve this issue, you can try downgrading your SQLAlchemy library to a version that is compatible with google-cloud-bigquery. For example, you can try downgrading to SQLAlchemy version 1.4.22. You can do this by running the following command in your terminal:
pip install SQLAlchemy==1.4.22
After downgrading SQLAlchemy, try running your script again. If the problem persists, you may need to adjust the versions of your other libraries to ensure compatibility. Please note that the version 1.4.22 is just an example, you might need to find the exact version that is compatible with your google-cloud-bigquery version. You can find this information in the google-cloud-bigquery's documentation or you can try different versions until you find one that works. [docs/snippets/modules/chains/popular/sqlite.mdx, docs/extras/guides/evaluation/sql_qa_benchmarking_chinook.ipynb, tests/integration_tests/chains/test_sql_database.py, langchain/chains/sql_database/base.py, docs/extras/ecosystem/integrations/databricks/databricks.ipynb]
This response is meant to be useful, save you time, and share context. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
Hi, did you manage to resolve this issue? I'm having a similar problem
Same here, I encountered similar error and cannot find any solution
Downgrading "sqlalchemy-bigquery" to 1.7.0 gets it to work, but with the latest version (1.8.0) the same problem occurs.
pip install sqlalchemy-bigquery==1.7.0
Hi, @sofgun1! I'm Dosu, and I'm here to help the LangChain team manage their backlog. I wanted to let you know that we are marking this issue as stale.
From what I understand, you encountered a TypeError
when trying to run an example from the SQLDatabaseChain
module. It seems that there was a conflict between the versions of SQLAlchemy and google-cloud-bigquery libraries being used. @abdulmateen59 suggested downgrading "sqlalchemy-bigquery" to version 1.7.0 as a workaround for this issue.
Before we close this issue, we wanted to check with you if it is still relevant to the latest version of the LangChain repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.
Thank you for your understanding and contribution to the LangChain project!
I'm facing the same problem. Using 1.7.0 version for "sqlalchemy-bigquery" is not working and still same persists.
Any alternate solution will be helpful.
i'm facing the same problem
@Krithika-Devi did you solve this ?
Solved with :
# replace "x" with your project info
creds_file = "creds.json"
project = "x"
dataset= "x"
sqlalchemy_url = f"bigquery://{project}/{dataset}?credentials_path={creds_file}"
db = SQLDatabase.from_uri(sqlalchemy_url)
# your code
toolkit = SQLDatabaseToolkit(db=db, llm=langchain_model.llm)
input = f"""
I want to know the 5 top results of NUMCLI :
- table name : {table_name}
"""
agent_executor = create_sql_agent(
llm=llm,
toolkit=toolkit,
verbose=True,
top_k=1000,
handle_parsing_errors=True,
)
agent_executor.run(input)
dependencies :
Any solution to this problem?
Any solution to this problem?
@mfernandezsidn You can try to change the version of your dependencies :
Sqlalchemy 1.4.11 Sqalchemy-bigquery 1.9.0 Sqlitedict 2.1.0 Langchain 0.0.352 Google cloud bigquery 3.14.1 Google cloud bigquery storage 2.24.0
I also had a problem because of two sqlachemy libraries . Langchain was using the wrong one, so I had to delete it. Make sure that langchain is calling the right library
@AmineDjeghri is it important the langchain version? I need that version because of compatibilities with another libraries.
My LLM is being correctly executed (TextGenerationModel.from_pretrained("text-bison-32k"))
` agent_executor = create_sql_agent( llm=llm, toolkit=toolkit, verbose=True, top_k=1000, handle_parsing_errors=True, )
output = agent_executor.run(final_prompt)`
When the last line finishes executing, it returns the error of the cache. Any idea?
Can you post your error ?
@AmineDjeghri this is the error trace:
Entering new AgentExecutor chain... Se esta ejecutando el call 1 Se esta ejecutando el call 2 Question: What are the top 5 pages with the most impressions on the website in the last 30 days? Thought: To answer this question, I need to find the pages with the most impressions in the last 30 days. I can do this by querying the
searchconsole.searchdata_site_impression
table and filtering for the appropriate date range. I will also order the results by the number of impressions in descending order and limit the results to the top 5. Action: sql_db_query Action Input:sql SELECT page, SUM(impressions) AS total_impressions FROM `searchconsole.searchdata_site_impression` WHERE DATE BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) AND CURRENT_DATE() GROUP BY page ORDER BY total_impressions DESC LIMIT 5;
C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_community\utilities\sql_database.py:413: SAWarning: Dialect bigquery:bigquery will not make use of SQL compilation caching as it does not set the 'supports_statement_cache' attribute toTrue
. This can have significant performance implications including some performance degradations in comparison to prior SQLAlchemy versions. Dialect maintainers should seek to set this attribute to True after appropriate development and testing for SQLAlchemy 1.4 caching support. Alternatively, this attribute may be set to False which will disable this warning. (Background on this warning at: https://sqlalche.me/e/20/cprf) cursor = connection.execute(text(command)) Traceback (most recent call last): File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\gradio\routes.py", line 534, in predict output = await route_utils.call_process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\gradio\route_utils.py", line 226, in call_process_api output = await app.get_blocks().process_api( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\gradio\blocks.py", line 1550, in process_api result = await self.call_function( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\gradio\blocks.py", line 1185, in call_function prediction = await anyio.to_thread.run_sync( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\anyio\to_thread.py", line 56, in run_sync return await get_async_backend().run_sync_in_worker_thread( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\anyio_backends_asyncio.py", line 2134, in run_sync_in_worker_thread
return await future ^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\anyio_backends_asyncio.py", line 851, in run result = context.run(func, args) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\gradio\utils.py", line 661, in wrapper response = f(args, kwargs) ^^^^^^^^^^^^^^^^^^ File "c:\Users\my.user\Downloads\datalitica_dash\app\services\langchain\test.py", line 87, in bq_qna output = agent_executor.run(final_prompt) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_core_api\deprecation.py", line 145, in warning_emitting_wrapper return wrapped(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain\chains\base.py", line 538, in run return self(args[0], callbacks=callbacks, tags=tags, metadata=metadata)[ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_core_api\deprecation.py", line 145, in warning_emitting_wrapper return wrapped(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain\chains\base.py", line 363, in call return self.invoke( ^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain\chains\base.py", line 162, in invoke raise e File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain\chains\base.py", line 156, in invoke self._call(inputs, run_manager=run_manager) File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain\agents\agent.py", line 1371, in _call next_step_output = self._take_next_step( ^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain\agents\agent.py", line 1097, in _take_next_step [ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain\agents\agent.py", line 1097, in[ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain\agents\agent.py", line 1193, in _iter_next_step observation = tool.run( ^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_core\tools.py", line 374, in run raise e File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_core\tools.py", line 346, in run self._run(*tool_args, run_manager=run_manager, tool_kwargs) File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_community\tools\sql_database\tool.py", line 43, in _run
return self.db.run_no_throw(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_community\utilities\sql_database.py", line 484, in run_no_throw return self.run(command, fetch, include_columns) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_community\utilities\sql_database.py", line 436, in run
result = self._execute(command, fetch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\langchain_community\utilities\sql_database.py", line 413, in _execute
cursor = connection.execute(text(command)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\sqlalchemy\engine\base.py", line 1416, in execute return meth( ^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\sqlalchemy\sql\elements.py", line 517, in _execute_on_connection
return connection._execute_clauseelement( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\sqlalchemy\engine\base.py", line 1631, in _execute_clauseelement
compiled_sql, extracted_params, cache_hit = elem._compile_w_cache( ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\sqlalchemy\sql\elements.py", line 718, in _compile_w_cache compiled_sql = self._compiler( ^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\sqlalchemy\sql\elements.py", line 317, in _compiler return dialect.statement_compiler(dialect, self, kw) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\my.user\AppData\Local\anaconda3\envs\datalitica\Lib\site-packages\pybigquery\sqlalchemy_bigquery.py", line 137, in init super(BigQueryCompiler, self).init(dialect, statement, column_keys, inline, **kwargs) TypeError: SQLCompiler.init() got multiple values for argument 'cache_key'
These are the lines of code:
` llm = VertexLLM( model_name='text-bison-32k', max_output_tokens=256, temperature=0, top_p=0.8, top_k=40, verbose=True, )
table_names_list = list(table_names_options)
db = SQLDatabase.from_uri(f"bigquery://{project_id}") # Establish the database connection
toolkit = SQLDatabaseToolkit(db=db, llm=llm)
vertexai.init(project=project_id, location="us-central1")
_googlesql_prompt = """Any Questions Asked needs to be first Translated to English and then believe that You are a GoogleSQL expert. Given an input question, first create a syntactically correct GoogleSQL query to run, then look at the results of the query and return the answer to the input question.
Unless the user specifies in the question a specific number of examples to obtain, query for at most {top_k} results using the LIMIT clause as per GoogleSQL. You can order the results to return the most informative data in the database.
Never query for all columns from a table. You must query only the columns that are needed to answer the question. Wrap each column name in backticks (`) to denote them as delimited identifiers.
Pay attention to use only the column names you can see in the tables below. Be careful to not query for columns that do not exist. Also, pay attention to which column is in which table.
Use the following format:
Question: "Question here"
SQLQuery: "SQL Query to run"
SQLResult: "Result of the SQLQuery"
Answer: "Final answer here"
Only use the following tables:
{table_info}
If someone asks for aggregation on a STRING data type column, then CAST column as NUMERIC before you do the aggregation.
If someone asks for a specific month, use ActivityDate between the current month's start date and the current month's end date
If someone asks for column names in the table, use the following format:
SELECT column_name
FROM `{project_id}.{dataset_id}`.INFORMATION_SCHEMA.COLUMNS
WHERE table_name in {table_info}
Question: {input}"""
GOOGLESQL_PROMPT = PromptTemplate(
input_variables=["input", "table_info", "top_k", "project_id", "dataset_id"],
template=_googlesql_prompt,
)
final_prompt = GOOGLESQL_PROMPT.format(
input=question,
project_id=project_id,
dataset_id=dataset_id,
table_info=table_names_list,
top_k=10000
)
agent_executor = create_sql_agent(
llm=llm,
toolkit=toolkit,
verbose=True,
top_k=1000,
handle_parsing_errors=True,
)
output = agent_executor.run(final_prompt)
`
VertexLLM is a custom class:
`
class VertexLLM(VertexCommon, LLM):
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
print("Se esta ejecutando el call 1")
"""
This function is to make predictions using the specified model.
"""
return self._predict(prompt, stop)
`
And VertexCommon:
` class CustomMetaClass(ModelMetaclass): def new(mcs, name, bases, attrs): cls = type.new(mcs, name, bases, attrs) return cls
class VertexCommon(BaseModel, metaclass=CustomMetaClass): """ This class contains common properties and methods for Vertex AI models. """ client = TextGenerationModel.from_pretrained("text-bison-32k") model_name: str = "text-bison-32k" temperature: float = 0.2 top_p: int = 0.8 top_k: int = 40 max_output_tokens: int = 1024
@property
def _default_params(self):
return {
"temperature": self.temperature,
"top_p": self.top_p,
"top_k": self.top_k,
"max_output_tokens": self.max_output_tokens
}
def _predict(self, prompt: str, stop: Optional[List[str]]) -> str:
"""
This function is to make predictions based on a given prompt.
"""
res = self.client.predict(prompt, **self._default_params)
print("Se esta ejecutando el call 2")
return self._enforce_stop_words(res.text, stop)
def _enforce_stop_words(self, text: str, stop: Optional[List[str]]) -> str:
"""
This function is to enforce stop words in the generated text.
"""
if stop:
return enforce_stop_tokens(text, stop)
return text
@property
def _llm_type(self) -> str:
return "vertex_ai"
`
The code is basically from: https://medium.com/@rakeshmohandas/effortless-multilingual-data-querying-a-generative-ai-system-for-bigquery-nlqs-8be06ab18f5b
The problem seems to be with the pybigquery but I don't see you are using it in the list of dependencies you gave me. I have removed the library but I do receive the error:
File "C:\Users\my.user\AppData\Local\anaconda3\envs\ddd\Lib\site-packages\sqlalchemy\util\langhelpers.py", line 374, in load raise exc.NoSuchModuleError( sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:bigquery
That was the exact problem!! The pybigquery was messing everything!
It's the same library that caused me a problem and i needed to remove. Glad you found the source of the problem
🤖
It seems like you're encountering a TypeError
when trying to run a query using the SQLDatabaseChain
with a BigQuery database. The error message indicates that there's an issue with the SQLCompiler.__init__()
method receiving multiple values for the argument 'cache_key'.
This error might be due to a compatibility issue between the versions of SQLAlchemy
and google-cloud-bigquery
you're using. The SQLAlchemy
version you're using is 2.0.11
, which is quite recent, and there might be some changes in this version that are not yet fully compatible with google-cloud-bigquery
.
As a first step, I would recommend trying to downgrade your SQLAlchemy
version to 1.4.x
and see if the issue persists. You can do this by running the following command in your terminal:
pip install SQLAlchemy==1.4.22
If downgrading SQLAlchemy
doesn't resolve the issue, you might also want to try downgrading google-cloud-bigquery
to a version that is known to work well with SQLAlchemy 1.4.x
, such as 2.6.2
. You can do this by running:
pip install google-cloud-bigquery==2.6.2
Please try these steps and let me know if the issue persists. If it does, it would be helpful if you could provide more information about your environment, such as the operating system you're using and the version of Python. This will help us better understand the issue and provide a more accurate solution.
For more information about the compatibility between SQLAlchemy
and google-cloud-bigquery
, you can refer to the SQLAlchemy Changelog and the google-cloud-bigquery Changelog.
I hope this helps! If you have any other questions or need further clarification, feel free to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
The problem seems to be with the pybigquery but I don't see you are using it in the list of dependencies you gave me. I have removed the library but I do receive the error:
File "C:\Users\my.user\AppData\Local\anaconda3\envs\ddd\Lib\site-packages\sqlalchemy\util\langhelpers.py", line 374, in load raise exc.NoSuchModuleError( sqlalchemy.exc.NoSuchModuleError: Can't load plugin: sqlalchemy.dialects:bigquery
I am getting the same error when I remove the pybigquery library did you find a solution to this?
I also got the same issue and is resolved by lower python version. Below are my dependencies
Python: 3.8.19 pandas==2.0.3 numpy==1.24.4 streamlit==1.36.0 openai==1.33.0 langchain==0.1.20 langchain-openai==0.1.16 faiss-cpu==1.8.0 langchain_community==0.0.38 pymysql==1.1.1 sqlalchemy==2.0.30 pybigquery google-cloud-bigquery sqlalchemy.orm
System Info
While trying to run example from https://python.langchain.com/docs/modules/chains/popular/sqlite I get the following error:
I use a VertexAI model (MODEL_TEXT_BISON_001) as the LLM.
Some essential library versions: langchain == 0.0.206 SQLAlchemy == 2.0.11 ipython == 8.12.1 python == 3.10.10 google-cloud-bigquery == 3.10.0 google-cloud-bigquery-storage == 2.16.2
Who can help?
@hwchase17
Information
Related Components
Reproduction
db
= SQLDatabase.from_uri(f"bigquery://{project_id}/{dataset}") toolkit = SQLDatabaseToolkit(llm=llm, db=db) db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True) db_chain.run("What is the sport that generates highest revenue")`Expected behavior
When running the db_chain, I expect to get an answer from the bigquery database