mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.92k stars 563 forks source link

Question: Is there any way to use BULK INSERT for SQL Server? #990

Closed AlexeySumin closed 2 years ago

AlexeySumin commented 2 years ago

Hello! I am looking for the fastest way to insert big amount of data into SqlServer from Queues\Files\etc. using Python. At the moment i am using https://github.com/zillow/ctds becuase it supports BULK INSERT: https://zillow.github.io/ctds/bulk_insert.html#

CTDS is not a popular project and nobody knows will it be maintained or not. Microsoft directly recomends to use pyodbc: https://docs.microsoft.com/en-us/sql/connect/python/python-driver-for-sql-server?view=sql-server-ver15

Searching this repo by keywords like 'bulk' did not help.

So, my questions are: Does pyodbc support bulk insert? Is there any equivalent or better alternative?

Environment

gordthompson commented 2 years ago

Have you seen this?

https://stackoverflow.com/q/29638136/2144390

gizmo93 commented 2 years ago

@AlexeySumin We are using cTDS at our company if we need "real" tds BULK INSERTs in python and it works well. The author is supportive if there are issues. It's a little bit hard to install though if you are using Windows as there are (were?) no wheel files available.

pymssql and pytds (https://github.com/denisenkom/pytds) also support BULK INSERTs. pymssql and cTDS are both based on FreeTDS, pytds is pure python but caused higher CPU load in our tests.

Pyodbc's fast_executemany is okay for most cases but it causes a lot of batch insert requests to be made on the SQL Server and it tends to be much slower when inserting data with a lot of columns and especially a lot of string columns. (We needed to rewrite some of our CSV import tasks to use cTDSs BULK INSERTs because they took forever when using pyodbc fast_executemany, but these were very wide datasets.)