Closed mikegong closed 2 years ago
Maybe your problem is nothing to do with python, but instead linked your HADR and clidriver/connection-string configuration?
Please provide the following details:
(1) Has the DBA correctly enabled automatic client reroute on both the primary and secondaries? Give details how that configuration was performed.
(2) How have you verified that the HADR failover/takeover correctly moved the virtual IP-address/hostname to the standby?
(3) What is the version of your clidriver for python ibm_db (or Db2 driver) - that is to say, what is the text output of its db2level
command on the hostname where python runs?
(4) Is your Db2-server configuration a purescale environment?
(5) On your client workstation where python runs, are you using a DSN (configured in db2dsdriver.cfg), or are you hardcoding the connection-string in the python code in the ibm_db_connect() ?
(6) in your db2dsdriver.cfg
on the client, or in your connection-string in python, have you set QueryTimeout, or any other ACR related configuration setting?
The Db2-LUW documentation describes at least eight parameters (for db2dsdriver.cfg
or connection string) that control how the automatic client reroute behaves for non-java clients (i.e. for cli, like python). Pay attention to their default values, which might differ from your assumptions, and test different settings in your environment.
See the documentation here https://www.ibm.com/docs/en/db2/11.5?topic=njcshacd-configuration-db2-automatic-client-reroute-support-applications-other-than-java
Maybe your problem is nothing to do with python, but instead linked your HADR and clidriver/connection-string configuration?
Please provide the following details:
(1) Has the DBA correctly enabled automatic client reroute on both the primary and secondaries? Give details how that configuration was performed.
(2) How have you verified that the HADR failover/takeover correctly moved the virtual IP-address/hostname to the standby?
(3) What is the version of your clidriver for python ibm_db (or Db2 driver) - that is to say, what is the text output of its
db2level
command on the hostname where python runs?(4) Is your Db2-server configuration a purescale environment?
(5) On your client workstation where python runs, are you using a DSN (configured in db2dsdriver.cfg), or are you hardcoding the connection-string in the python code in the ibm_db_connect() ?
(6) in your
db2dsdriver.cfg
on the client, or in your connection-string in python, have you set QueryTimeout, or any other ACR related configuration setting?The Db2-LUW documentation describes at least eight parameters (for
db2dsdriver.cfg
or connection string) that control how the automatic client reroute behaves for non-java clients (i.e. for cli, like python). Pay attention to their default values, which might differ from your assumptions, and test different settings in your environment.See the documentation here https://www.ibm.com/docs/en/db2/11.5?topic=njcshacd-configuration-db2-automatic-client-reroute-support-applications-other-than-java
Hi imavo,
Thanks a lot for the information! Here is my test, using connection string, no db2dsdriver.cfg setup.
1) ibm_db.connect() to primary node with success 2) ibm_db.close() 3) ibm_db.connect() to primary node again 4) step 3 was hung
In db2diag.log on client side,
FUNCTION: DB2 UDB, common communication, sqlcctcpconnr, probe:42
MESSAGE : DIA3222E The host name "
FUNCTION: DB2 UDB, DRDA Communication Manager, sqljcCommConnect, probe:10 MESSAGE : ZRC=0x81360012=-2127167470=SQLZ_RC_CMERR, SQLT_SQLJC "External Comm error" DATA #1 : String, 11 bytes CCI Error: DATA #2 : unsigned integer, 4 bytes 97
FUNCTION: DB2 UDB, common communication, sqlcctcplogIPAddress, probe:50
MESSAGE : DIA3220W Client sent data from IP address = "
FUNCTION: DB2 UDB, DRDA Application Requester, sqljrRetrySetup, probe:14 MESSAGE : ZRC=0x8037006D=-2143879059=SQLJR_CABLT "CA BUILT" DATA #1 : String, 30 bytes Client Reroute is starting....
FUNCTION: DB2 UDB, DRDA Application Requester, sqljrRetrySetup, probe:15 DATA #1 : String, 50 bytes The correlation token of the failed connection is DATA #2 : String, 32 bytes
FUNCTION: DB2 UDB, DRDA Application Requester, sqljrListToConnect, probe:20 DATA #1 : String, 40 bytes Reconnecting to Hostname/IP Address --> DATA #2 : String, 13 bytes
ACR cannot operate properly when the Db2-server configuration is inadequate, and Db2-client configuration is inadquate, and no workstation specific workaround is specified.
See the technote https://www.ibm.com/support/pages/db2-automatic-client-reroute-alternate-server-list-srvrlstxml-contains-incorrect-or-unresolvable-hostname
This is not a matter for python ibm_db, it is purely a configuration matter for the Db2-server and Db2-client configuration.
You can prove this by temporarily removing python-and-ibm_db by recreating your test scenario with the db2cli
tooling - which can run a connection test at the command line for properly configured clidriver db2dsdriver.cfg
configurations.
I tested Db2 native connection by setting up catalog, and JDBC connection with connection string. There was no problem. There was also no problem with python ibm_db without Db2 client.
jdbc does not share any code with any ibm supplied CLI client. python ibm_db will not use a jdbc driver , although bridging software can force this to happen.
Consequently for the purposes of this question, the jdbc behaviour is irrelevant for CLI driver operation.
The python ibm_db module requires a Db2 client to communicate with Db2, and will not connect withou a CLI interface provider (e.g a CLI driver). For that reason, python ibm_db installs by default the clidriver (a tiny footprint Db2 client ), although python ibm_db can use any comapatible CLI driver if so configured , from IBM or third parties.
More tests show this hung problem occurred on AIX but not on RedHat Linux. This hung problem doesn't happen with JDBC driver either. Kindly please consider removing hung from ibm_db driver. A workaround could be adding a config parameter to disable ACR from ibm_db connection. BTW, ibm_db connection crashed on AIX with Db2 CLI driver. Thanks!
Hi @mikegong
I have tested the HADR failover scenario using python application and were able to complete my tests without any error.
Please find below the details of my analysis.
import ibm_db
import time
print("IBM_DB_VERSION : " + str(ibm_db.__version__))
print("=> Creating Connection")
#conn = ibm_db.connect("DATABASE=HADRDB;HOSTNAME=HADRHOSTNAME;PORT=PORT;UID=USER;PWD=PASSWORD;PROTOCOL=TCPIP",'','')
conn = ibm_db.connect("DSN=MEM1;UID=USER;PWD=PASSWORD",'','')
print("=> Creating Tables")
try:
res = ibm_db.exec_immediate(conn,"DROP TABLE HADR_TEST")
except:
print("Already dropped")
create_stmt = "Create TABLE HADR_TEST (c1 int)"
print("=> RUNNING : " + create_stmt)
try:
result = ibm_db.exec_immediate(conn,create_stmt)
print(result)
except:
#result = ibm_db.exec_immediate(conn,"DROP TABLE hadr_test")
#result = ibm_db.exec_immediate(conn,create_stmt)
print("=> Table exists....re-created")
insert_val = "insert into hadr_test values (1)"
print("=> RUNNING : " + insert_val)
try:
ibm_db.exec_immediate(conn,insert_val)
except Exception as e:
print(str(e))
insert_val = "insert into hadr_test values (2)"
print("=> RUNNING : " + insert_val)
try:
ibm_db.exec_immediate(conn,insert_val)
except Exception as e:
print(str(e))
insert_val = "insert into hadr_test values (3)"
print("=> RUNNING : " + insert_val)
try:
ibm_db.exec_immediate(conn,insert_val)
except Exception as e:
print(str(e))
time.sleep(40)
insert_val = "insert into hadr_test values (4)"
print("=> RUNNING : " + insert_val)
try:
ibm_db.exec_immediate(conn,insert_val)
except Exception as e:
print(str(e))
insert_val = "insert into hadr_test values (5)"
print("=> RUNNING : " + insert_val)
try:
ibm_db.exec_immediate(conn,insert_val)
except Exception as e:
print(str(e))
insert_val = "insert into hadr_test values (6)"
print("=> RUNNING : " + insert_val)
try:
ibm_db.exec_immediate(conn,insert_val)
except Exception as e:
print(str(e))
print("=> Getting field type of the table")
res = "select * from hadr_test"
exe = ibm_db.exec_immediate(conn,res)
row = ibm_db.fetch_tuple(exe)
while row:
print(row[0])
row = ibm_db.fetch_tuple(exe)
In this case I have configured the alternate server list in the db2dsdriver.cfg file and used the alias name to create connection to the active member of the HADR. Once the first connection is established, during the sleep time in the application code I am switching the HADR members roles and once the application comes out from the sleep I could see the FAILOVER happening and all workloads are re-routed via the next member.
CFG FILE USE
<configuration>
<dsncollection>
<dsn alias="MEM1" name="HADRDB" host="HOSTNAME1" port="PORT1"/>
</dsncollection>
<databases>
<database name="HADRDB" host="HOSTNAME1" port="PORT1">
<acr>
<parameter name="enableACR" value="true"/>
<parameter name="enableSeamlessACR" value="false"/>
<alternateserverlist>
<server name="S1" hostname="HOSTNAME1" port="PORT1"/>
<server name="S2" hostname="HOSTNAME2" port="PORT2"/>
</alternateserverlist>
<affinitylist>
<list name="list1" serverorder="S1,S2"/>
</affinitylist>
<clientaffinitydefined>
<client name="client1" hostname="localhost" listname="list1"/>
</clientaffinitydefined>
</acr>
</database>
</databases>
<parameters>
</parameters>
</configuration>
OUTPUT
python3 hadr1.py
IBM_DB_VERSION : 3.0.4
=> Creating Connection
=> Creating Tables
=> RUNNING : Create TABLE HADR_TEST (c1 int)
<ibm_db.IBM_DBStatement object at 0x7f195548c3b0>
=> RUNNING : insert into hadr_test values (1)
=> RUNNING : insert into hadr_test values (2)
=> RUNNING : insert into hadr_test values (3)
=> RUNNING : insert into hadr_test values (4)
[IBM][CLI Driver][DB2/NT64] SQL30108N A connection failed in an automatic client reroute environment. The transaction was rolled back. Host name or IP address: "HOSTNAME". Service name or port number: "50000". Reason code: "1". Connection failure code: "2". Underlying error: "*". SQLSTATE=08506 SQLCODE=-30108
=> RUNNING : insert into hadr_test values (5)
=> RUNNING : insert into hadr_test values (6)
=> Getting field type of the table
1
2
3
5
6
In this case the alternate server is configured as part of the server configuration using the command
db2 update alternate server for database DATABASE_NAME using hostname HOSTNAME2 port PORT2
So the failover details are already configured as part of the server configuration.
Hence in this case the application used full connection string for making the first connection and during the sleep the member role was switched.
Trace shows the failover happened during the role switch.
SQLExecDirectW( hStmt=1:3, pszSqlStr="insert into hadr_test values (4)", cbSqlStr=-3 )
---> Time elapsed - +0.000000E+000 seconds
( StmtOut="insert into hadr_test values (4)" )
( Package="SYSSH200 ", Section=65 )
( ROLLBACK REPLY RECEIEVED=1 )
( ROLLBACK REPLY RECEIEVED=1 )
( Unretrieved error message="[IBM][CLI Driver][DB2/NT64] SQL30108N A connection failed in an automatic client reroute environment. The transaction was rolled back. Host name or IP address: "HOSTNAME". Service name or port number: "PORT". Reason code: "1". Connection failure code: "2". Underlying error: "*". SQLSTATE=08506
" )
( Unretrieved error message="[IBM][CLI Driver][DB2/NT64] SQL30108N A connection failed in an automatic client reroute environment. The transaction was rolled back. Host name or IP address: "HOSTNAME". Service name or port number: "PORT". Reason code: "1". Connection failure code: "2". Underlying error: "*". SQLSTATE=08506
" )
( StmtOut="insert into hadr_test values (4)" )
( Package="SYSSH200 ", Section=65 )
( COMMIT REPLY RECEIVED=1 )
SQLExecDirectW( )
<--- SQL_SUCCESS Time elapsed - +0.000000E+000 seconds
OUTPUT
python3 hadr1.py
IBM_DB_VERSION : 3.0.4
=> Creating Connection
=> Creating Tables
=> RUNNING : Create TABLE HADR_TEST (c1 int)
<ibm_db.IBM_DBStatement object at 0x7f7bd88b73b0>
=> RUNNING : insert into hadr_test values (1)
=> RUNNING : insert into hadr_test values (2)
=> RUNNING : insert into hadr_test values (3)
=> RUNNING : insert into hadr_test values (4)
=> RUNNING : insert into hadr_test values (5)
=> RUNNING : insert into hadr_test values (6)
=> Getting field type of the table
1
2
3
4
5
6
Please let me know if the above explanation is in par with your application handling as well.
Thanks
Is there any update on this case.
In case the explanation provided above work can this issue be closed?
Closing on the case as there is no further update on this. Explanation with example provided.
ibm_db.connect was hung in its second connection. Tested to connect to the same HADR primary server, first connection use host name, second connection use IP address, or vise versa, worked. If use the same host name or same IP address in two connections, it did not work. This problem only happened with HADR database. There is no problem to connect to a standalone database repeatedly.
It appeared that something was cached and messed things up. Is there anyway to workaround this? Thanks.