pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
43.72k stars 17.93k forks source link

BUG: Dataframe's `to_sql` method fails while trying to persist Dataframe to Postgtres in Version 2.2.0 #57196

Open damianr13 opened 9 months ago

damianr13 commented 9 months ago

Pandas version checks

Reproducible Example

import pandas as pd
from sqlalchemy import create_engine

from .settings import get_settings  # pydantic way of reading the env variables

conn_string = get_settings().db_conn_string
df = pd.DataFrame(range(5), columns=["id"])
db = create_engine(conn_string)
conn = db.connect()

nr_rows = df.to_sql(
    "temp_pandas_test", con=conn, if_exists="replace", index=False, chunksize=1000
)

Issue Description

Exporting a pandas dataframe to Postgres doesn't work anymore in version 2.2.0. The code above works well with version 2.1.2, but in version 2.2.0 it fails with the following error:

Traceback (most recent call last):
  File "/Users/robert-andreidamian/Workspace/panprices/shelf_analytics_update_products/demo.py", line 14, in <module>
    nr_rows = df.to_sql(
  File "/Users/robert-andreidamian/.local/share/virtualenvs/shelf_analytics_update_products-BwnpeEoI/lib/python3.9/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
  File "/Users/robert-andreidamian/.local/share/virtualenvs/shelf_analytics_update_products-BwnpeEoI/lib/python3.9/site-packages/pandas/core/generic.py", line 3059, in to_sql
    return sql.to_sql(
  File "/Users/robert-andreidamian/.local/share/virtualenvs/shelf_analytics_update_products-BwnpeEoI/lib/python3.9/site-packages/pandas/io/sql.py", line 858, in to_sql
    return pandas_sql.to_sql(
  File "/Users/robert-andreidamian/.local/share/virtualenvs/shelf_analytics_update_products-BwnpeEoI/lib/python3.9/site-packages/pandas/io/sql.py", line 2867, in to_sql
    table.create()
  File "/Users/robert-andreidamian/.local/share/virtualenvs/shelf_analytics_update_products-BwnpeEoI/lib/python3.9/site-packages/pandas/io/sql.py", line 1000, in create
    if self.exists():
  File "/Users/robert-andreidamian/.local/share/virtualenvs/shelf_analytics_update_products-BwnpeEoI/lib/python3.9/site-packages/pandas/io/sql.py", line 986, in exists
    return self.pd_sql.has_table(self.name, self.schema)
  File "/Users/robert-andreidamian/.local/share/virtualenvs/shelf_analytics_update_products-BwnpeEoI/lib/python3.9/site-packages/pandas/io/sql.py", line 2882, in has_table
    return len(self.execute(query, [name]).fetchall()) > 0
  File "/Users/robert-andreidamian/.local/share/virtualenvs/shelf_analytics_update_products-BwnpeEoI/lib/python3.9/site-packages/pandas/io/sql.py", line 2689, in execute
    cur = self.con.cursor()
AttributeError: 'Connection' object has no attribute 'cursor'

I noticed that a new driver was added in Version 2.2.0, which might be related to my issue.

Expected Behavior

The code above should execute the same way it used to do with the previous version, and export the dataframe to a table in my Postgres Database.

Installed Versions

INSTALLED VERSIONS

commit : c3014abb3bf2e15fa66d194ebd2867161527c497 python : 3.9.13.final.0 python-bits : 64 OS : Darwin OS-release : 23.1.0 Version : Darwin Kernel Version 23.1.0: Mon Oct 9 21:32:52 PDT 2023; root:xnu-10002.41.9~7/RELEASE_ARM64_T8122 machine : x86_64 processor : i386 byteorder : little LC_ALL : None LANG : None LOCALE : en_GB.UTF-8 pandas : 3.0.0.dev0+210.gc3014abb3b numpy : 1.26.3 pytz : 2023.4 dateutil : 2.8.2 setuptools : 68.2.2 pip : 23.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 5.1.0 html5lib : None pymysql : None psycopg2 : 2.9.9 jinja2 : 3.1.3 IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : None bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.4 qtpy : None pyqt5 : None

phofl commented 9 months ago

cc @WillAyd related to your changes?

luke396 commented 9 months ago

maybe related #57049 and #57178

cguinard-fbx commented 9 months ago

Same error here with an sqlite database (I get the error with pandas 2.2.0 but not with 2.1.4)