langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.54k stars 15.51k forks source link

SQLDatabase.from_uri Throws an error when used with Vertica #16129

Closed elghali closed 7 months ago

elghali commented 10 months ago

Checked other resources

Example Code

    db = SQLDatabase.from_uri(
        'vertica+vertica_python://user:pwd@x.x.x.x:5433/verticadst',
        schema='myschema'
    )

Description

When I use the above snippet to create an SQLDatabase object for vertica, I get the below error:

sqlalchemy.exc.ProgrammingError: (vertica_python.errors.MissingSchema) Severity: ERROR, Message: Schema "pg_catalog" does not exist

Which makes sense because Vertica does not have pg_catalog schema, but how can I create the object without the reflect part where it tries to read from pg_catalog

System Info

sqlalchemy-vertica-python~=0.6.3 langchain==0.1.0

Related Components

dosubot[bot] commented 10 months ago

🤖

Hello @elghali! :wave:

I'm Dosu - your friendly, neighborhood bot helper. :robot:

I'm here to lend a hand with squashing bugs :bug:, answering your queries and helping you become a LangChain contributor.

While we wait for a human maintainer to join us, consider me your bug-busting buddy. Let's tackle this head-on! :muscle:

And remember, no bug is too big, no question too small. :alien:

It seems like you're encountering an issue with the SQLDatabase.from_uri method when trying to connect to a Vertica database. This is likely due to the fact that Vertica does not have a 'pg_catalog' schema, which is a system schema found in PostgreSQL.

The SQLDatabase.from_uri method in LangChain uses SQLAlchemy's create_engine function to create a database engine from a URI. SQLAlchemy supports a wide range of SQL databases, including Vertica. Therefore, as long as the correct Vertica database URI is provided, the from_uri method should be able to create a connection to a Vertica database.

Here is the relevant code:

@classmethod
def from_uri(
    cls, database_uri: str, engine_args: Optional[dict] = None, **kwargs: Any
) -> SQLDatabase:
    """Construct a SQLAlchemy engine from URI."""
    _engine_args = engine_args or {}
    return cls(create_engine(database_uri, **_engine_args), **kwargs)

In this method, database_uri is the URI of the database you want to connect to. For Vertica, it should be in the format: vertica+pyodbc://<user>:<password>@<dsn>. The engine_args parameter is an optional dictionary of arguments to be passed to SQLAlchemy's create_engine function. The **kwargs parameter is used to pass any additional arguments to the SQLDatabase constructor.

To avoid the error, you need to specify the correct schema that exists in your Vertica database when creating the SQLDatabase object. Here is an example of how you can do this:

from sqlalchemy import create_engine
from libs.community.langchain_community.utilities.sql_database import SQLDatabase

# Create a SQLAlchemy engine for Vertica
engine = create_engine('vertica+pyodbc://<username>:<password>@<dsn_name>')

# Create an SQLDatabase object
db = SQLDatabase(engine, schema='<your_vertica_schema>')

In this example, replace <username>, <password>, <dsn_name>, and <your_vertica_schema> with your actual Vertica username, password, DSN name, and schema respectively.

Please note that you need to have the vertica-python or pyodbc package installed and a valid ODBC connection set up for Vertica. The connection string format may vary depending on your specific setup.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources:

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

liv4unix commented 10 months ago

Thank you for the great instructions, unfortunately the same error persists. I included a snippet in this comment and the full traceback is attached.

langchanin_sqlalchemy_vertica_python_MissingSchema_error.txt

Also, I am using vertica_python instead of odbc to connect langchain to verticadb using sqlalchemy.

------------Error snippet---------------

MissingSchema: Severity: ERROR, Message: Schema "pg_catalog" does not exist, Sqlstate: 3F000, Routine: RangeVarGetObjid, File: /data/jenkins/workspace/RE-ReleaseBuilds/RE-Miner/server/vertica/Catalog/NamespaceLookup.cpp, Line: 319, Error Code: 4650 ....

The above exception was the direct cause of the following exception:

ProgrammingError                          Traceback (most recent call last)
Cell In[24], line 18
     15 engine = sa.create_engine('vertica+vertica_python://dbadmin:password@20.127.231.99:5433/VMart')
     17 #db = SQLDatabase.from_uri("vertica+vertica_python://dbadmin:password@20.127.231.99:5433/VMart")
---> 18 db = SQLDatabase(engine, schema="public")
     20 db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)

------------Current Config---------------

import verticapy as vp
import sqlalchemy as sa
from langchain_community.utilities import SQLDatabase
from langchain_experimental.sql import SQLDatabaseChain
from sqlalchemy import create_engine
#from langchain_openai import OpenAI
from langchain.llms import GooglePalm

llm = GooglePalm(google_api_key=api_key, temperature=0.2)
#llm = OpenAI(openai_api_key=api_key, temperature=0.1, verbose=True)

engine = sa.create_engine('vertica+vertica_python://<Vertia_DBA>:<DBA_PASS>@<IP>:<PORT>/<DB>')
#db = SQLDatabase.from_uri("vertica+vertica_python://<Vertia_DBA>:<DBA_PASS>@<IP>:<PORT>/<DB>")
db = SQLDatabase(engine, schema='public')
db_chain = SQLDatabaseChain.from_llm(llm, db, verbose=True)

-----------condensed pip freeze---------------


chroma-hnswlib==0.7.3
chromadb==0.4.22
fastapi==0.109.0
google-ai-generativelanguage==0.4.0
google-api-core==2.15.0
google-auth==2.26.2
google-auth-oauthlib==1.2.0
google-generativeai==0.3.2
google-pasta==0.2.0
googleapis-common-protos==1.62.0
graphviz==0.20.1
huggingface-hub==0.20.2
humanfriendly==10.0
keras==2.15.0
kubernetes==29.0.0
langchain==0.0.339
langchain-community==0.0.13
langchain-core==0.1.13
langchain-experimental==0.0.49
langsmith==0.0.83
libclang==16.0.6
lightning-utilities==0.10.1
llvmlite==0.40.1
Markdown==3.5.2
marshmallow==3.20.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.18.1
nvidia-nvjitlink-cu12==12.3.101
nvidia-nvtx-cu12==12.1.105
onnxruntime==1.16.3
opt-einsum==3.3.0
plotly==5.18.0
portalocker==2.8.2
posthog==3.3.2
proto-plus==1.23.0
protobuf==4.23.4
pulsar-client==3.4.0
pyarrow==13.0.0
pycurl==7.45.1
pydantic==2.5.3
pydantic_core==2.14.6
PyPika==0.48.9
pyproject_hooks==1.0.0
python-dotenv==1.0.0
python-markdown-math==0.8
pytorch-lightning==2.1.3
regex==2023.12.25
requests-oauthlib==1.3.1
rsa==4.9
safetensors==0.4.1
sentence-transformers==2.2.2
sentencepiece==0.1.99
simplegeneric==0.8.1
sqlalchemy-vertica-python==0.6.3
starlette==0.35.1
tenacity==8.2.3
tensorboard==2.15.1
tensorboard-data-server==0.7.2
tensorflow==2.15.0.post1
tensorflow-estimator==2.15.0
tensorflow-io-gcs-filesystem==0.35.0
termcolor==2.4.0
tokenizers==0.15.0
torch==2.1.2
torchmetrics==1.3.0.post0
torchvision==0.16.2
transformers==4.36.2
triton==2.1.0
typer==0.9.0
typing-inspect==0.9.0
vertica-highcharts==0.1.4
vertica-python==1.3.8
verticapy==1.0.1
elghali commented 10 months ago

Hello @liv4unix, for me what solved the issue was adding the vertica-sqlalchemy-dialect~=0.0.8.11 to the list of pip requirements

liv4unix commented 10 months ago

Thanks for this tip - updating the dialect to 0.0.8.1 fixed my issue, and langchain is now connected to the Vertica 12 database using sqlalchemy and verticapy. Thanks so much!