pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.88k stars 18.03k forks source link

REGR: 1.4.0rc0: pd.read_sql not compatible with pymysql #45416

Closed auderson closed 2 years ago

auderson commented 2 years ago

Pandas version checks

Reproducible Example

import pymysql as sql
import pandas as pd

mysql_account = {}

con = sql.connect(**mysql_account)
pd.read_sql('', con)

Issue Description

the above code raises ValueError: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connection

Expected Behavior

Is pymysql not supported anymore?

Installed Versions

INSTALLED VERSIONS ------------------ commit : d023ba755322e09b95fd954bbdc43f5be224688e python : 3.8.10.final.0 python-bits : 64 OS : Linux OS-release : 5.8.0-63-generic Version : #71-Ubuntu SMP Tue Jul 13 15:59:12 UTC 2021 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 1.4.0rc0 numpy : 1.20.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.4 setuptools : 58.0.4 Cython : 0.29.24 pytest : 6.2.5 hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.6.2 html5lib : 1.1 pymysql : 1.0.2 psycopg2 : 2.8.6 (dt dec pq3 ext lo64) jinja2 : 2.11.3 IPython : 7.29.0 pandas_datareader: 0.9.0 bs4 : None bottleneck : None fsspec : 2021.11.1 fastparquet : None gcsfs : None matplotlib : 3.5.0 numexpr : 2.7.3 odfpy : None openpyxl : 3.0.7 pandas_gbq : None pyarrow : 6.0.1 pyxlsb : None s3fs : None scipy : 1.7.3 sqlalchemy : 1.4.27 tables : None tabulate : 0.8.9 xarray : None xlrd : None xlwt : None numba : 0.54.1 zstandard : None
simonjayhawkins commented 2 years ago

The ValueError raised was added in #42546 cc @fangchenli

fangchenli commented 2 years ago

We currently only support pymysql through sqlalchemy.

asishm commented 2 years ago

Confirming this is breaking in 1.4.0rc0

It used to work earlier (prior to the linked PR) as the con object was getting passed into SQLiteDatabase which looks generic enough to accept any DBAPI2 connection obj.

Jefffish09 commented 2 years ago

Same error when using pymssql in 1.4.0rc0

image

jorisvandenbossche commented 2 years ago

We have documented that we only support sqlite3 for plain DBAPI2 connection objects:

https://github.com/pandas-dev/pandas/blob/21b7dafcb24305ea47ef5a5ee6e7475f768bcdd0/pandas/io/sql.py#L462-L464

But it is true that any connection has implicitly worked up to now as long as you are just reading a query (not reading a table or writing). Given that this has worked for so long, we should probably not just change it without warning.

asishm commented 2 years ago

Thanks @jorisvandenbossche 2 clarifying points:

  1. Reading a table didn't work for sqlite3 connections (checked on 1.3.4) or any other DBAPI2 connection (maybe deserves a new issue here) i.e. pd.read_sql('table_name', sqlite3_conn) fails with DatabaseError: Execution failed on sql 'table_name': near "table_name": syntax error. The decision whether to pass to read_sql_table only seems to be happening for sqlalchemy connections
  2. Writing to a table never worked with a non-sqlite3 DBAPI2 connection as it would try to write/create to a sqlite3 instance (which likely would not exist)
simonjayhawkins commented 2 years ago

Given that this has worked for so long, we should probably not just change it without warning.

@jorisvandenbossche Is this severe enough to block 1.4.0?

simonjayhawkins commented 2 years ago

we could maybe revert #42546 for now