tmolomby / pyodbc

Automatically exported from code.google.com/p/pyodbc
MIT No Attribution
0 stars 0 forks source link

slow handling of executemany #250

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Please refer to this stackoverflow link (I am the asker):

http://stackoverflow.com/a/9932361

Apparently ceODBC doesn't have this issue, though I haven't tested it yet.

Original issue reported on code.google.com by sok...@gmail.com on 29 Mar 2012 at 11:32

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Tried both ceODBC and mxODBC and both were also painfully slow.  Ended up going 
with an adodb connection with help from http://www.ecp.cc/pyado.html

        comConn = win32com.client.Dispatch(r'ADODB.Connection')
    DSN = 'PROVIDER=Microsoft.Jet.OLEDB.4.0;DATA SOURCE=%s%s' %(dbDIR,dbOut)
    comConn.Open(DSN)

    rs = win32com.client.Dispatch(r'ADODB.Recordset')
    rs.Open('[' + tblName +']', comConn, 1, 3)

    for f in values:
        rs.AddNew(LVI_LST, f)

Original comment by jaycgree...@gmail.com on 5 Apr 2012 at 9:02

GoogleCodeExporter commented 9 years ago
The behavior I encountered last time was as described by ghoerz in the 
aforementioned stackoverflow link, i.e.

    for each set of parameters
        prepare and execute the insert statement
        unprepare the statement
        commit

However, now that I try again, I couldn't reproduce that behavior anymore with 
any version of pyodbc from 2.1.8 to 3.0.6. Instead, it is now:

    for each set of parameters
        prepare and execute the insert statement
    unprepare the statement
    commit

This gives good enough performance.

Does anyone have the environment that can reproduce the slow behavior described 
by ghoerz? If not, I think we can close the issue. I am sorry that I wasn't 
diligent enough to note it down when reporting the issue.

Original comment by sok...@gmail.com on 31 Dec 2012 at 6:07

GoogleCodeExporter commented 9 years ago
Verified resolved with pyodbc-2.1.9.win32-py2.7

Original comment by bryan.ea...@gmail.com on 2 Jan 2013 at 8:44

Attachments:

GoogleCodeExporter commented 9 years ago
I'm still finding a very large difference in the performance of the 
'executemany' command between the pyodbc module and the ceODBC module.  In a 
nutshell, ceODBC is 35 times faster doing bulk inserts over pyodbc.

In my case, I'm inserting 250,000 records into a SQL Server 2008 R2 database 
(on its own server), using Python 2.7 on a Windows 7 (64-bit) PC, using the 
'{SQL Server Native Client 10.0}' driver.  I'm batching the records up into 
sets of 10,000 records before running 'executemany'.  This takes 12.5 minutes 
with pyodbc, but just 21 seconds with ceODBC.  A huge difference.

For reference, with this test, pyodbc is about 30% slower than the pymssql 
module.

Can you think of a reason why the pyodbc module should be so slow?  Is there 
something I should be doing in my Python code?

Original comment by erski...@dnc.org on 5 Aug 2013 at 1:36

GoogleCodeExporter commented 9 years ago
I don't think this issue is resolved. 

I'm using the most recent pyodbc-3.0.7-py2.7 and have found the same 
phenomenon. The same python code for pyodbc takes minutes, but for ceODBC takes 
only seconds. Version is ceODBC-2.0.1-py2.7. Critical code is

    conn = ceODBC.connect(conninfo, autocommit = True)
    cur = conn.cursor()
    cur.executemany(UPSERT_SQL, DATA)
    conn.close()

The executemany implementation of ceODBC must be many factors more efficient. 
If someone tells me how to do the profiling, I'm happy to help. The database I 
use is SAP HANA and for convenience, I use a 32-bit ODBC driver.

--Richard

Original comment by rlindne...@gmail.com on 29 Oct 2013 at 12:33

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
I'm also having trouble with pyODBC performance.

I tested  1000 small rows into Netezza.

ceODBC executemany: 0.3 seconds
pyodbc executemany: 180-195 seconds
pyodbc execute: 78-130 seconds

Python 2.7 32-bit, pyodbc 3.0.7, ceodbc 2.0.1, Windows 7 64-bit, Netezza 
Release 7.0 (P-1) [Build 26407]

Original comment by ahz...@gmail.com on 18 Sep 2014 at 4:19

GoogleCodeExporter commented 9 years ago
If need to update 300.000 columns it never finishes. This bug really needs to 
be fixed.

Original comment by Atoc....@gmail.com on 30 Nov 2014 at 5:07

GoogleCodeExporter commented 9 years ago
AFAICS, Cursor_executemany is binding a single parameter and calling execute 
(which calls SQLExecute).

In ceODBC, it binds all parameters and then calls SQLExecute.

Am I reading this correctly?

Original comment by javier.r...@gmail.com on 15 Jan 2015 at 1:46

GoogleCodeExporter commented 9 years ago
I tested by running an insert statement for multiple rows of data using 
executemany.  In the SQL Server Profiler of the run, it showed every row being 
inserted using the sp_prepexec stored procedure.  This is causing implicit 
unprepares and explicit prepares of the query over and over again.  What I 
believe it should do is call sp_prepexec only for the first row (which prepares 
the query and executes the first row), and then call sp_execute for all 
subsequent rows in the batch (which will use the existing prepared query).

At the end of the batch, executemany does call an explicit sp_unprepare, which 
is fine, but I believe these implicit unprepares and explicit prepares are what 
is killing performance.  (You can see these in Profiler if you turn on the 
"Prepare SQL" and "Unprepare SQL" events in the "TSQL" events heading.)

Original comment by rich...@gmail.com on 10 Feb 2015 at 6:40

Attachments:

atocyo commented 7 years ago

Will this be ever fixed?