trinodb / trino-python-client

Python client for Trino
Apache License 2.0
327 stars 163 forks source link

Installing trino[sqlalchemy] wont allow pandas.read_sql #352

Closed GitToby closed 1 year ago

GitToby commented 1 year ago

Expected behavior

Running a pandas.read_sql or similar fails with AttributeError: 'OptionEngine' object has no attribute 'execute'

Actual behavior

Running a pandas.read_sql or similar should return dataframe with the table results.

Steps To Reproduce

install with trino[sqlalchemy] then run

engine = create_engine(
    "trino://my.instance:443/abcd",
    connect_args={
        "user": user_,
        "auth": BasicAuthentication(user_, pass_),
        "http_scheme": "https",
        "verify": "path/to/bundle.cer",
        # ...
    }
)
df = pd.read_sql("SELECT * FROM system.runtime.nodes", engine)

This is due to the optional dependancy not limiting the sqlalchemt version and hence falls victim to this bug: https://github.com/pandas-dev/pandas/issues/51015

A fix would be to limit the version to sqlalchemy<2.0.0

Log output

No response

Operating System

macos

Trino Python client version

0.322.0

Trino Server version

na

Python version

3.11

Are you willing to submit PR?

GitToby commented 1 year ago

a fix is to add the dependencies

after install this should push the sqlalchemy down and the pandas methods will work.

hashhar commented 1 year ago

This is something that users of this package need to do. They can add explicit constraint for sqlalchemy. In case pandas doesn't work with sqlalchemy 2.x pandas should add the constraint and then pip would automatically resolve to a working version.

Alternatively use pandas 2.0 which works with sqlalchemy 2.x

GitToby commented 1 year ago

at the time of writing pandas 2.0 was released 3 days ago and is not really stable. I suppose this can be skipped internally to the Trino Python client, maybe documentation is a better approach for those who run into this issue.