zillow / ctds

Python DB-API 2.0 library for MS SQL Server
MIT License
83 stars 12 forks source link

For bulk_insert, expose "order" property? #82

Open amachanic opened 3 years ago

amachanic commented 3 years ago

Hello,

Are you able to expose the "order" property for bulk_insert, to tell SQL Server that the input rows are ordered the same as the clustered index? This can eliminate a server-side sort and really speed up some inserts.

I found the option in the FreeTDS docs, here:

https://www.freetds.org/reference/a00547.html#gafec3feb5bac1cb33545ba01efb0a67b8

Thanks!

joshuahlang commented 3 years ago

I think this will be fairly straight forward and I can add it. Just need to get CI/CD working fully on Github Actions now that travis-ci is no longer viable.

joshuahlang commented 3 years ago

Looked into this tonight and the FreeTDS support for BULK INSERT hints is very poor. It appears it only supports passing one hint option and the ORDER hint, which requires the column name unless I'm misreading the docs, just doesn't work as there's no way to specify the ColumnName

amachanic commented 3 years ago

I really appreciate your looking into it, @joshuahlang!

I'm confused regarding the passing only "one hint"; doesn't cTDS allow specification of both rows per batch and tablock simultaneously? Or is only one of them actually used?

joshuahlang commented 3 years ago

The batch_size isn't passed as a hint via bcp_options, but ctds can/would support multiple hint options. The problem is FreeTDS' bcp_options doesn't support it. You'll notice in the FreeTDS source code that each call to bcp_options overwrites any previously set dbproc->bcpinfo->hint value. The last set hint via bcp_options is the only one passed. And as I mentioned previously, it isn't possible to provide the column name for the ORDER hint. A quick scan of the FreeTDS source reveals this functionality doesn't have any tests written. I'd guess no one ever tested it (the ORDER hint) cause it doesn't appear to work unless I'm missing something