yhat / pandasql

sqldf for pandas
MIT License
1.31k stars 184 forks source link

sqldf does not work #103

Open psads-git opened 1 year ago

psads-git commented 1 year ago

The following code when run

import pandas as pd
from pandasql import sqldf

df = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [10, 20, 30, 40]})

query = "SELECT * FROM df WHERE col1 > 2"

result = sqldf(query, globals())
print(result)

gives the following error:

Output exceeds the size limit. Open the full output data in a text editor
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
File ~/.virtualenvs/r-reticulate/lib64/python3.11/site-packages/sqlalchemy/engine/base.py:1410, in Connection.execute(self, statement, parameters, execution_options)
   1409 try:
-> 1410     meth = statement._execute_on_connection
   1411 except AttributeError as err:

AttributeError: 'str' object has no attribute '_execute_on_connection'

The above exception was the direct cause of the following exception:

ObjectNotExecutableError                  Traceback (most recent call last)
Cell In[1], line 11
      8 query = "SELECT * FROM df WHERE col1 > 2"
     10 # Execute the query using pandasql
---> 11 result = sqldf(query, globals())
     13 print(result)

File ~/.virtualenvs/r-reticulate/lib64/python3.11/site-packages/pandasql/sqldf.py:156, in sqldf(query, env, db_uri)
    124 def sqldf(query, env=None, db_uri='sqlite:///:memory:'):
    125     """
    126     Query pandas data frames using sql syntax
    127     This function is meant for backward compatibility only. New users are encouraged to use the PandaSQL class.
   (...)
    154     >>> sqldf("select avg(x) from df;", locals())
...
   1416         distilled_parameters,
   1417         execution_options or NO_OPTIONS,
   1418     )

ObjectNotExecutableError: Not an executable object: 'SELECT * FROM df WHERE col1 > 2'

Could someone please help me?

Isaac-Flath commented 1 year ago

I just debugged this earlier today for a coworker and found downgrading SQLAlchemy resolved it via pip install SQLAlchemy==1.4.46.

I think pandasql is not compatible with SQLalchemy 2 and above.

See https://github.com/yhat/pandasql/issues/102

psads-git commented 1 year ago

Thanks, Isaac: Indeed, by downgrading SQLAlchemy to the version you suggest fixes the problem.

hrshdhgd commented 1 year ago

If you want to use SQLAlchemy v2.x, I think merging my PR will resolve this and 2 other issues.