Closed lrhazi closed 5 years ago
I found out my issue occurs not during "connect", but during "execute". What's odd is that when I start my script, with python -i, it fails, and then I simply recall the same funtion that failed, with the same sql statement, and most of the time, it works fine! Still, additional debugging log would be maybe useful. Thanks a lot.
You can use ODPI-C debugging to get some additional information. In particular logging levels 4, 8 and 16 would be of most use with cx_Oracle. The error and your description of the problem sounds like it might be a network stability issue, though.
Thanks a lot Anthony. My question has been answered, but am gonna try to get more :)
am getting this output, any idea where to look next? I am afraid db server side? would a packet capture help me much?
ODPI [00423] 2019-05-21 14:50:34.847: fn start dpiContext_create
ODPI [00423] 2019-05-21 14:50:34.874: fn end dpiContext_create -> 0
ODPI [00423] 2019-05-21 14:50:34.874: fn start dpiContext_getClientVersion(0x2759190)
ODPI [00423] 2019-05-21 14:50:34.875: fn end dpiContext_getClientVersion(0x2759190) -> 0
ODPI [00423] 2019-05-21 14:50:34.875: fn start dpiContext_initCommonCreateParams(0x2759190)
ODPI [00423] 2019-05-21 14:50:34.875: fn end dpiContext_initCommonCreateParams(0x2759190) -> 0
ODPI [00423] 2019-05-21 14:50:34.875: fn start dpiContext_initConnCreateParams(0x2759190)
ODPI [00423] 2019-05-21 14:50:34.875: fn end dpiContext_initConnCreateParams(0x2759190) -> 0
ODPI [00423] 2019-05-21 14:50:34.875: fn start dpiConn_create(0x2759190)
ODPI [00423] 2019-05-21 14:50:35.041: fn end dpiConn_create(0x2759190) -> 0
ODPI [00423] 2019-05-21 14:50:35.041: fn start dpiConn_getEncodingInfo(0x2711fa0)
ODPI [00423] 2019-05-21 14:50:35.041: fn end dpiConn_getEncodingInfo(0x2711fa0) -> 0
DB: Got connected.
DB: Got cursor.
ODPI [00423] 2019-05-21 14:50:35.042: fn start dpiConn_prepareStmt(0x2711fa0)
ODPI [00423] 2019-05-21 14:50:35.042: SQL SELECT DISTINCT C.EMAIL_ADDR
FROM PS_EMAIL_ADDRESSES C,
PS_JOB @OPSUSER_CS_HR D
WHERE C.PREF_EMAIL_FLAG = 'Y'
AND C.EMAIL_ADDR <> ' '
AND C.EMPLID = D.EMPLID
AND D.EFFDT =
(SELECT MAX(D_ED.EFFDT) FROM PS_JOB @OPSUSER_CS_HR D_ED
WHERE D.EMPLID = D_ED.EMPLID
AND D.EMPL_RCD = D_ED.EMPL_RCD
AND D_ED.EFFDT <= SYSDATE)
AND D.EFFSEQ =
(SELECT MAX(D_ES.EFFSEQ) FROM PS_JOB @OPSUSER_CS_HR D_ES
WHERE D.EMPLID = D_ES.EMPLID
AND D.EMPL_RCD = D_ES.EMPL_RCD
AND D.EFFDT = D_ES.EFFDT)
AND D.EMPL_STATUS IN ('A','L','P','S')
AND D.PAYGROUP = 'FRF'
ORDER BY C.EMAIL_ADDR
-- FETCH FIRST 10 ROWS ONLY
ODPI [00423] 2019-05-21 14:50:35.042: fn end dpiConn_prepareStmt(0x2711fa0) -> 0
ODPI [00423] 2019-05-21 14:50:35.042: fn start dpiStmt_getInfo(0x24e1250)
ODPI [00423] 2019-05-21 14:50:35.042: fn end dpiStmt_getInfo(0x24e1250) -> 0
ODPI [00423] 2019-05-21 14:50:35.042: fn start dpiStmt_setFetchArraySize(0x24e1250)
ODPI [00423] 2019-05-21 14:50:35.042: fn end dpiStmt_setFetchArraySize(0x24e1250) -> 0
ODPI [00423] 2019-05-21 14:50:35.042: fn start dpiStmt_execute(0x24e1250)
########## <long time goes by>
ODPI [00423] 2019-05-21 14:55:36.376: OCI error ORA-03135: connection lost contact
Process ID: 8580
Session ID: 4 Serial number: 61666 (dpiStmt_execute / execute)
ODPI [00423] 2019-05-21 14:55:36.376: fn end dpiStmt_execute(0x24e1250) -> -1
t=group-mgmt-test m=oracle p=ERROR ORA-03135: connection lost contact
Process ID: 8580
Session ID: 4 Serial number: 61666
ODPI [00423] 2019-05-21 14:55:36.378: fn start dpiStmt_release(0x24e1250)
ODPI [00423] 2019-05-21 14:55:36.378: fn end dpiStmt_release(0x24e1250) -> 0
ODPI [00423] 2019-05-21 14:55:36.378: fn start dpiConn_release(0x2711fa0)
ODPI [00423] 2019-05-21 14:55:36.379: fn end dpiConn_release(0x2711fa0) -> 0
Whats really odd is that, like I said, if I stayed in the same Python interpreter instance and right after the error, call back my code, it just works:
ORDER BY C.EMAIL_ADDR
-- FETCH FIRST 10 ROWS ONLY
ODPI [00423] 2019-05-21 14:57:55.188: fn end dpiConn_prepareStmt(0x2859710) -> 0
ODPI [00423] 2019-05-21 14:57:55.188: fn start dpiStmt_getInfo(0x25104e0)
ODPI [00423] 2019-05-21 14:57:55.188: fn end dpiStmt_getInfo(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:57:55.188: fn start dpiStmt_setFetchArraySize(0x25104e0)
ODPI [00423] 2019-05-21 14:57:55.188: fn end dpiStmt_setFetchArraySize(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:57:55.188: fn start dpiStmt_execute(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.079: fn end dpiStmt_execute(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.079: fn start dpiStmt_getRowCount(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.079: fn end dpiStmt_getRowCount(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.079: fn start dpiStmt_getQueryInfo(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.079: fn end dpiStmt_getQueryInfo(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.079: fn start dpiConn_newVar(0x2859710)
ODPI [00423] 2019-05-21 14:58:06.079: fn end dpiConn_newVar(0x2859710) -> 0
ODPI [00423] 2019-05-21 14:58:06.079: fn start dpiVar_getSizeInBytes(0x2805c50)
ODPI [00423] 2019-05-21 14:58:06.079: fn end dpiVar_getSizeInBytes(0x2805c50) -> 0
ODPI [00423] 2019-05-21 14:58:06.079: fn start dpiStmt_define(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.080: fn end dpiStmt_define(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.080: fn start dpiStmt_fetchRows(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.082: fn end dpiStmt_fetchRows(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.082: fn start dpiStmt_fetchRows(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.083: fn end dpiStmt_fetchRows(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.083: fn start dpiStmt_fetchRows(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.084: fn end dpiStmt_fetchRows(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.084: fn start dpiStmt_fetchRows(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.085: fn end dpiStmt_fetchRows(0x25104e0) -> 0
DB: Got results: 399
ODPI [00423] 2019-05-21 14:58:06.086: fn start dpiVar_release(0x2805c50)
ODPI [00423] 2019-05-21 14:58:06.086: fn end dpiVar_release(0x2805c50) -> 0
ODPI [00423] 2019-05-21 14:58:06.086: fn start dpiStmt_close(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.086: fn end dpiStmt_close(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.086: fn start dpiStmt_release(0x25104e0)
ODPI [00423] 2019-05-21 14:58:06.086: fn end dpiStmt_release(0x25104e0) -> 0
ODPI [00423] 2019-05-21 14:58:06.086: fn start dpiConn_release(0x2859710)
ODPI [00423] 2019-05-21 14:58:06.094: fn end dpiConn_release(0x2859710) -> 0
If you look at the time difference between the start of the call to dpiStmt_execute()
and resulting failure, you'll note it is almost exactly 5 minutes (300 seconds). That suggests network configuration -- but you can run it several times to see if you consistently get the 300 seconds.
And if the network is flaky in any way, working and not working in quick succession is quite normal. I am not a network guru, though! :-)
Really weird though:
I run my script like so: python3 -i -- main.py
def run_sql(sql):
results = None
db_string = get_db_string()
try:
db_conn = cx_Oracle.connect(db_string)
cur = db_conn.cursor()
cur.execute(sql)
results = [x[0] for x in cur.fetchall()]
cur.close()
db_conn.close()
except DatabaseError as e:
logger.error(e)
return results
Which means am starting over each time, right? why would restarting python bring the problem back?
That is weird! What is the sql you are executing? Can you try something as simple as select 1 from dual
?
yes, that simple sql seems to work fine! and my actual sql is in the debug log I pasted earleir. It's not too complicated and only returns couple hundered rows, when it works.
ODPI [01168] 2019-05-21 15:38:46.371: fn end dpiVar_getSizeInBytes(0x1ee4980) -> 0
ODPI [01168] 2019-05-21 15:38:46.371: fn start dpiStmt_define(0x1eca280)
ODPI [01168] 2019-05-21 15:38:46.371: fn end dpiStmt_define(0x1eca280) -> 0
ODPI [01168] 2019-05-21 15:38:46.372: fn start dpiStmt_fetchRows(0x1eca280)
ODPI [01168] 2019-05-21 15:38:46.372: fn end dpiStmt_fetchRows(0x1eca280) -> 0
DB: Got results: 1
ODPI [01168] 2019-05-21 15:38:46.372: fn start dpiVar_release(0x1ee4980)
ODPI [01168] 2019-05-21 15:38:46.372: fn end dpiVar_release(0x1ee4980) -> 0
ODPI [01168] 2019-05-21 15:38:46.372: fn start dpiStmt_close(0x1eca280)
ODPI [01168] 2019-05-21 15:38:46.372: fn end dpiStmt_close(0x1eca280) -> 0
ODPI [01168] 2019-05-21 15:38:46.372: fn start dpiStmt_release(0x1eca280)
Hmm, I'm not sure what to say. @cjbj may be able to help further when he is awake...but in the meantime I'd suggest exploring the boundaries of the failure -- seeing whether the amount of time the execute takes is related to the failure in any way -- or the amount of data. It looks like your SQL contains references to other databases via database link? Perhaps try fetching a single row from a view that references a database link?
That database link might explain things, though -- why it fails the first time (establishing the connection via the database link takes some time?) and works fine the rest of the times?
Great! thanks a lot. I will dig further.
Hmmm, I think my issue was a little extra space!!!
PS_JOB @OPSUSER_CS_HR D
Thank you so much for the tips!
Is there a way to increase the verbosity level of this module so as to get debugging or even tracing logs?
Frequently my connections fail with this output:
And wonder if I could get more details, like did it connect first properly, did it send sql statement and was waiting on response?
Thanks a lot.