pandas-dev / pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
https://pandas.pydata.org
BSD 3-Clause "New" or "Revised" License
42.57k stars 17.56k forks source link

BUG: to_sql does not populate index column with a value when using the mssql+pyodbc engine #59111

Open hb2638 opened 4 days ago

hb2638 commented 4 days ago

Pandas version checks

Reproducible Example

import math
import uuid

import pandas
import sqlalchemy.engine

input = {
    1: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
    2: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
    3: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
    4: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
    5: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
}
df = pandas.DataFrame.from_dict(input, "index")
conn_url = sqlalchemy.engine.URL.create("mssql+pyodbc", query={"odbc_connect": "DRIVER={ODBC Driver 18 for SQL Server};SERVER=.;Trusted_Connection=yes;TrustServerCertificate=yes"})
engine = sqlalchemy.create_engine(conn_url, echo=True)
with engine.connect() as conn:
    with conn.begin():
        df.to_sql(f"not_set_{str(uuid.uuid4())}", conn, index=True)
        conn.rollback()

Issue Description

When calling pandas.DataFrame.to_sql with index set to True, it will create the column for the index but never populates it with data, so it's always null

Below is the output from sql alchemy


2024-06-26 14:54:31,050 INFO sqlalchemy.engine.Engine SELECT CAST(SERVERPROPERTY('ProductVersion') AS VARCHAR)
2024-06-26 14:54:31,051 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-06-26 14:54:31,052 INFO sqlalchemy.engine.Engine SELECT schema_name()
2024-06-26 14:54:31,052 INFO sqlalchemy.engine.Engine [generated in 0.00015s] ()
2024-06-26 14:54:31,052 INFO sqlalchemy.engine.Engine SELECT CAST('test max support' AS NVARCHAR(max))
2024-06-26 14:54:31,053 INFO sqlalchemy.engine.Engine [generated in 0.00013s] ()
2024-06-26 14:54:31,053 INFO sqlalchemy.engine.Engine SELECT 1 FROM fn_listextendedproperty(default, default, default, default, default, default, default)
2024-06-26 14:54:31,053 INFO sqlalchemy.engine.Engine [generated in 0.00011s] ()
2024-06-26 14:54:31,054 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-06-26 14:54:31,065 INFO sqlalchemy.engine.Engine SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] 
FROM [INFORMATION_SCHEMA].[TABLES] 
WHERE ([INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) OR [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max))) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max))
2024-06-26 14:54:31,065 INFO sqlalchemy.engine.Engine [generated in 0.00037s] ('BASE TABLE', 'VIEW', 'not_set_125fd112-f19f-44ad-9c49-82be65893ed2', 'dbo')
2024-06-26 14:54:31,067 INFO sqlalchemy.engine.Engine 
CREATE TABLE [not_set_125fd112-f19f-44ad-9c49-82be65893ed2] (
    [('A', 'X')] FLOAT(53) NULL, 
    [('A', 'Y')] FLOAT(53) NULL, 
    [('B', 'X')] FLOAT(53) NULL, 
    [('B', 'Y')] FLOAT(53) NULL
)

2024-06-26 14:54:31,067 INFO sqlalchemy.engine.Engine [no key 0.00017s] ()
2024-06-26 14:54:31,071 INFO sqlalchemy.engine.Engine INSERT INTO [not_set_125fd112-f19f-44ad-9c49-82be65893ed2] ([('A', 'X')], [('A', 'Y')], [('B', 'X')], [('B', 'Y')]) VALUES (?, ?, ?, ?), (?, ?, ?, ?), (?, ?, ?, ?), (?, ?, ?, ?), (?, ?, ?, ?)
2024-06-26 14:54:31,071 INFO sqlalchemy.engine.Engine [generated in 0.00016s (insertmanyvalues) 1/1 (unordered)] (None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None)
2024-06-26 14:54:31,073 INFO sqlalchemy.engine.Engine ROLLBACK

Expected Behavior

The index column should be populated with data

Installed Versions

import pandas as pd pd.show_versions() INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.12.1.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 2.2.2 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.9.0.post0 setuptools : 70.1.0 pip : 24.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.2.0 lxml.etree : 5.2.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : 3.1.4 IPython : None pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : 2024.5.0 fsspec : 2024.6.0 gcsfs : None matplotlib : 3.8.4 numba : None numexpr : None odfpy : None openpyxl : 3.1.3 pandas_gbq : None pyarrow : 16.1.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : 2.0.30 tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None