mkleehammer / pyodbc

Python ODBC bridge
https://github.com/mkleehammer/pyodbc/wiki
MIT No Attribution
2.92k stars 561 forks source link

Use SQLBindCol when possible #1322

Open ffelixg opened 8 months ago

ffelixg commented 8 months ago

When all columns can be bound and the bandwidth is good enough, this should provide a decent performance improvement, as SQLGetData seems to use a bunch of CPU power. I didn't do extensive benchmarks, but in a small scale cloud web app connected to a decently powerful Azure SQL DB (same region), I got up to a 50% lower execution time on a cursor.execute(...).fetchall() call. Probably 30% is more realistic.

Unfortunately the API allows changing native_uuid and output converters between fetch calls, so the code checks for changes and rebinds when necessary. I've added a test for it.

Some config variables that could be added are a cap on total memory that can be allocated for binding and the threshold at which we use SQLGetData for variable length data. But since only one row is bound this might be unnecessary.

mkleehammer commented 7 months ago

Wow. That would be a really big performance improvement. It's a lot of code to go through, but I'll try to get to is soon.

ffelixg commented 7 months ago

Cool, thanks! Unfortunately I couldn't get native uuids to work on 64bit windows (seems to work fine everywhere else), so I switched back to GetData for that. The point at which it fails is the Py_BuildValue call in GetUUID. I tried a few things, including using the exact same buffer (in heap) for both cases (getting data via SQLGetData vs SQLBindCol+Fetch) and it looks to me like we're getting the same bytes both ways. Maybe there is some memory alignment issue I'm not understanding. I guess I'm also not sure why PYSQLGUID is defined the way it is and not just 16 bytes, maybe that has something to do with it.

ludekmatousek commented 7 months ago

Hello,

I can add some note to performance ....

for last couple of months I'm trying to solve performance issue when downloading the data from IBM iSeries (AS400/ IBM i). Transfer works fine with the files with simple structure. In case of 10+ columns performance drastically degraded. During the really log way across the AS400 TCP/buffer settings, networking department analyzing PCAPs, examination of active devices on the network way, playing with the ODBC buffers/LOB and many others, we ended with IBM support ....

IBM support analyzed ODBC trace and missing SQLBindCol resulting in "In the end about 80% of the time is spent in the application, about 5% on the IBM i and the rest is the network".

Looking forward for new version :)

ludekmatousek commented 7 months ago

Package built from ffelixg:bindcol and tested against the IBM iSeries (AS400/ IBM i) with positive results. Left side measurements with standard pyodbc, right side measurements with the new wheel, IBM cwbtrace as a proof of usage SQLBindCol(): SQLBind_pyodbc

ffelixg commented 6 months ago

Thank you for the benchmark! I suppose a cursor attribute to track how many columns were bound would be useful.

I've rewritten GetUUID to build the tuple manually instead of using Py_BuildValue, since that was behaving in funny ways. Now binding works for that too.

I've also tried another approach to integrate SQLBindCol in the BindCol_rowwise branch. The idea is that since we need to pre calculate all of the c types/buffer sizes for SQLBindCol anyway, we can store it inside ColumnInfo and abstract out SQLGetData to GetData from the individual Get Functions. As a consequence, most of the Get functions collapse to just a few lines. I think that this way, SQLBindCol and SQLGetData play together in a more natural way, but it's definitely more change to existing code.

This way it was also fairly straight forward to implement fetching arrays in case all columns can be bound (using rowwise binding - felt more natural, since we're returning rows). This seems to mostly just affect narrow fetches though and the performance gain definitely less than binding in the first place.

Curious to hear any thoughts.