zillow / ctds

Python DB-API 2.0 library for MS SQL Server
MIT License
83 stars 12 forks source link

Handling of auto increment pk while using bulk insert #59

Closed gizmo93 closed 4 years ago

gizmo93 commented 4 years ago

Having a table like this:

CREATE TABLE FancyTestTable ( Id INT NOT NULL IDENTITY(1,1) PRIMARY KEY, Number1 INT NOT NULL, Number2 INT NOT NULL )

and trying to insert data using bulk insert from a dataset like the following:

data = [ {'Number1': 1, 'Number2': 2}, {'Number1': 2, 'Number2': 3}, {'Number1': 3, 'Number2': 4}, ]

leads to problems because ctds tries to insert the Id Column with a NULL value, instead of just "ignoring" it and let the SQL Server do its work (incrementing the column). Maybe it would be better to build the used INSERT Query in the Dict-case just with the keys from the Dict as column names.

joshuahlang commented 4 years ago

Have you looked at using https://zillow.github.io/ctds/cursor.html#ctds.Cursor.executemany

gizmo93 commented 4 years ago

Yes, but executemany is, compared to the pyodbc implementation using fast_executemany=True much slower. It takes forever to insert a million rows and executemany does a lot of batch requests on the sql server, which is the reason why we prefer to use bulk inserts.

joshuahlang commented 4 years ago

Sorry I haven't responded sooner. I think this is easily addressable if ctds simply ignores IDENTITY columns, and never passes anything for them. Does that seem reasonable? It could also raise a warning if the caller attempts to specify the identity column in a dict row

joshuahlang commented 4 years ago

Should be fixed in 1.11.0