Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import math
import typing
import uuid
import pandas
import sqlalchemy
import sqlalchemy.engine
def insert_callback(table: pandas.io.sql.SQLTable, conn: sqlalchemy.engine.base.Connection, keys: list[str], data_iter: typing.Iterable[tuple]) -> None:
data = [list(d) for d in data_iter]
print(f"{data=}")
print(f"{keys=}")
print(f"{[c.name for c in table.table.columns]=}")
sql = f"""
INSERT INTO [{table.name}]({", ".join(f"[{k}]" for k in keys)})
VALUES ({", ".join(f":{i}" for i, _ in enumerate(keys))})
""".strip()
for values in data:
conn.execute(sqlalchemy.text(sql), {str(i): v for i, v in enumerate(values)})
input = {
1: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
2: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
3: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
4: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
5: {('A', 'X'): math.nan, ('A', 'Y'): math.nan, ('B', 'X'): math.nan, ('B', 'Y'): math.nan},
}
df = pandas.DataFrame.from_dict(input, "index")
conn_url = sqlalchemy.engine.URL.create("mssql+pyodbc", query={"odbc_connect": "DRIVER={ODBC Driver 18 for SQL Server};SERVER=.;Trusted_Connection=yes;TrustServerCertificate=yes"})
engine = sqlalchemy.create_engine(conn_url, echo=True)
with engine.connect() as conn:
with conn.begin():
df.to_sql(f"insert_callback_{str(uuid.uuid4())}", conn, index=True, method=insert_callback)
conn.rollback()
Issue Description
When calling pandas.DataFrame.to_sql with index set to True, it gives the the wrong name for the SQL column used for the index.
E.x.: In the code I provided, it creates a column called index as the index but it passed ('index', '') as the index column name in the list of columns.
Below is the output from sql alchemy
2024-06-26 16:02:23,136 INFO sqlalchemy.engine.Engine SELECT CAST(SERVERPROPERTY('ProductVersion') AS VARCHAR)
2024-06-26 16:02:23,136 INFO sqlalchemy.engine.Engine [raw sql] ()
2024-06-26 16:02:23,137 INFO sqlalchemy.engine.Engine SELECT schema_name()
2024-06-26 16:02:23,137 INFO sqlalchemy.engine.Engine [generated in 0.00014s] ()
2024-06-26 16:02:23,138 INFO sqlalchemy.engine.Engine SELECT CAST('test max support' AS NVARCHAR(max))
2024-06-26 16:02:23,138 INFO sqlalchemy.engine.Engine [generated in 0.00013s] ()
2024-06-26 16:02:23,138 INFO sqlalchemy.engine.Engine SELECT 1 FROM fn_listextendedproperty(default, default, default, default, default, default, default)
2024-06-26 16:02:23,138 INFO sqlalchemy.engine.Engine [generated in 0.00011s] ()
2024-06-26 16:02:23,139 INFO sqlalchemy.engine.Engine BEGIN (implicit)
2024-06-26 16:02:23,151 INFO sqlalchemy.engine.Engine SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME]
FROM [INFORMATION_SCHEMA].[TABLES]
WHERE ([INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max)) OR [INFORMATION_SCHEMA].[TABLES].[TABLE_TYPE] = CAST(? AS NVARCHAR(max))) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] = CAST(? AS NVARCHAR(max)) AND [INFORMATION_SCHEMA].[TABLES].[TABLE_SCHEMA] = CAST(? AS NVARCHAR(max))
2024-06-26 16:02:23,151 INFO sqlalchemy.engine.Engine [generated in 0.00032s] ('BASE TABLE', 'VIEW', 'insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06', 'dbo')
2024-06-26 16:02:23,153 INFO sqlalchemy.engine.Engine
CREATE TABLE [insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06] (
[index] BIGINT NULL,
[('A', 'X')] FLOAT(53) NULL,
[('A', 'Y')] FLOAT(53) NULL,
[('B', 'X')] FLOAT(53) NULL,
[('B', 'Y')] FLOAT(53) NULL
)
2024-06-26 16:02:23,153 INFO sqlalchemy.engine.Engine [no key 0.00014s] ()
2024-06-26 16:02:23,155 INFO sqlalchemy.engine.Engine CREATE INDEX [ix_insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06_index] ON [insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06] ([index])
2024-06-26 16:02:23,155 INFO sqlalchemy.engine.Engine [no key 0.00012s] ()
data=[[1, None, None, None, None], [2, None, None, None, None], [3, None, None, None, None], [4, None, None, None, None], [5, None, None, None, None]]
keys=["('index', '')", "('A', 'X')", "('A', 'Y')", "('B', 'X')", "('B', 'Y')"]
[c.name for c in table.table.columns]=['index', "('A', 'X')", "('A', 'Y')", "('B', 'X')", "('B', 'Y')"]
2024-06-26 16:02:23,158 INFO sqlalchemy.engine.Engine INSERT INTO [insert_callback_5ff921a3-15b6-474b-af16-db1d7e243f06]([('index', '')], [('A', 'X')], [('A', 'Y')], [('B', 'X')], [('B', 'Y')])
VALUES (?, ?, ?, ?, ?)
2024-06-26 16:02:23,158 INFO sqlalchemy.engine.Engine [generated in 0.00016s] (1, None, None, None, None)
2024-06-26 16:02:23,160 INFO sqlalchemy.engine.Engine ROLLBACK
Expected Behavior
The sql column name of the index should be in the list of keys/columns provided to the call back.
Pandas version checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
When calling pandas.DataFrame.to_sql with index set to True, it gives the the wrong name for the SQL column used for the index.
E.x.: In the code I provided, it creates a column called index as the index but it passed ('index', '') as the index column name in the list of columns.
Below is the output from sql alchemy
Expected Behavior
The sql column name of the index should be in the list of keys/columns provided to the call back.
Installed Versions