Major latency issues with remote fetches

GoogleCodeExporter commented 9 years ago

What steps will reproduce the problem?
1. Use PyDB2 to execute a query against a (fairly distant) remote database
2. Use ibm_db to execute the same query against the same remote database
3. Compare the times

What is the expected output? What do you see instead?

I've been testing my application (which can use pyodbc, PyDB2, or ibm_db 
to connect to DB2). The 0.4.0 release of the ibm_db driver seems to have 
fixed the crashing issues I was experienced with the driver previously, 
and when used against a local database the driver performs tolerably well 
(a couple of seconds slower than PyDB2, but no big deal).

However, when used against a remote database the difference is /vast/. 
Specifically, when querying a database with a ping time of 140ms, ibm_db 
is 4 orders of magnitude slower than PyDB2; so slow, it's effectively 
unusable. Here's the result of a quick test script I bashed together to 
compare the two drivers (I've attached the script):

Testing PyDB2
Executed query in 0:00:00.151270
Fetched 138 rows in 0:00:00.001147
Mean row fetch time: 0:00:00.000010
Median row fetch time: 0:00:00.000010

Testing ibm_db_dbi
Executed query in 0:00:00.577083
Fetched 138 rows in 0:00:19.619216
Mean row fetch time: 0:00:00.142121
Median row fetch time: 0:00:00.141764

What version of the product are you using? On what operating system?

The above results were from comparing ibm_db 0.4.0 to PyDB2 1.1.1 on 
Ubuntu Linux Server 7.10 for AMD64 with DB2 9.5 Enterprise Server Edition. 
The target database for the results above was a DB2 8 for z/OS instance. 
I've also tested the script against DB2 8 on AIX 64-bit, DB2 9.5 on Linux 
64-bit, and DB2 8 on Windows 32-bit. All showed performance degradation, 
with the amount being primarily dictated by the roundtrip time to the 
server.

Please provide any additional information below.

I haven't dug into this side of the fetching code yet, but I'll try and 
have a look this week. Either way, there's definitely something seriously 
wrong in there that'll need fixing before the driver'll be practical for 
any kind of heavy remote work. For example: I currently find myself in the 
amusing situation that my application will take 8 minutes to fetch a 
pretty large dataset from the remote z/OS instance with PyDB2 ... and 
approximately 3 months with ibm_db!! (assuming I've done my math right :-)

Original issue reported on code.google.com by wavefor...@gmail.com on 15 Oct 2008 at 12:01

Attachments:

testconn.py

GoogleCodeExporter commented 9 years ago

Original comment by abhigyan...@in.ibm.com on 22 Oct 2008 at 5:31

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Same here. Fetching rows is very slow. Example code:

import time
import ibm_db_dbi

# (here should be config code)

db = ibm_db_dbi.connect(cfg.dbhost, cfg.dbuser, cfg.dbpass)
c = db.cursor()

sel = "SELECT name FROM product;" # about 9500 rows

# test 1: using fetchone()
tm_o = time.time()
tm_n = time.time()
print 'start test 1: %s' % (tm_n - tm_o)

c.execute(sel)

tm_o = tm_n
tm_n = time.time()
print 'execute: %s' % (tm_n - tm_o)

while 1:
    ret = c.fetchone()
    if ret == None: break

tm_o = tm_n
tm_n = time.time()
print 'fetch by one: %s' % (tm_n - tm_o)

# test 1: using fetchall()
tm_o = time.time()
tm_n = time.time()
print 'start test 2: %s' % (tm_n - tm_o)

c.execute(sel)

tm_o = tm_n
tm_n = time.time()
print 'execute: %s' % (tm_n - tm_o)

ret = c.fetchall()

tm_o = tm_n
tm_n = time.time()
print 'fetch all: %s' % (tm_n - tm_o)

returns:

start test 1: 0.0
execute: 0.530999898911
fetch by one: 224.213000059
start test 2: 0.0
execute: 0.578000068665
fetch all: 194.164999962

Im using IBM db2 v8 on redhat. Python is running on local machine, winXP sp3 
32bit,
python 2.6, ibm_db v 1.0, IBM DB2 client v9

Original comment by mynth...@gmail.com on 26 Nov 2009 at 11:54

GoogleCodeExporter commented 9 years ago

ibm_db_dbi.py
_fetch_helper() line 1195

this is what caouses fetch to work slow:
row = ibm_db.fetch_tuple(self.stmt_handler)

unfortunatelly i have no C skills to see if it can be faster.

Original comment by mynth...@gmail.com on 26 Nov 2009 at 1:33

GoogleCodeExporter commented 9 years ago

I have made some changes in dbi wrapper to increase performance. This will also
decreases the latency time.

The modified dbi wrapper is in attachment of
http://groups.google.com/group/ibm_db/browse_thread/thread/2f6bdffa68524ce1
discussion thread. 

Please download dbi wrapper and replace the existing ibm_db_dbi.py with this 
one(at
location “../site-packages/ibm_db.py2.x-xxx.egg/”).

Give it a try and let me know hows it works.

Original comment by rahul.pr...@in.ibm.com on 8 Apr 2010 at 6:51

GoogleCodeExporter commented 9 years ago

Fixed in ibm_db-1.0.2

Original comment by rahul.pr...@in.ibm.com on 30 Apr 2010 at 6:02

Changed state: Fixed

waliwali / ibm-db

Major latency issues with remote fetches #19