Open sterankin opened 5 years ago
Hello Sterankin,
This is a very strange problem indeed. Have you tried removing the sleep statement? (that's a joke)
I've noticed that the jruby version and adapter version are a bit out of date. There have been a few changes to db2/as400 part of the adapter. Updating may help to clear up the issue.
On the other hand, it almost sounds like a network issue. Where connection is dropped and ruby can't communicate with AS400. Still does not explain why the length of outage is always the same.
Does it always happen with the same type of query or to different ones at the time? Are there locks on the table it's trying to read?
It happens randomly in our sidekiq job, when we connect to db2, it doesn't happen on any particular query but it is generally the first query where we try to read a simple row from the DB2 database. But before it executes our SELECT query, it seems to execute the following:
Executing SQL Statement -->[SELECT 1 FROM sysibm.tables FETCH FIRST 1 ROWS ONLY].
And then it hangs for 16 mins. I think the driver generated query is some sort of keep alive sql? After it hangs for 16 mins, it starts up again and all works ok.
I thought it could be network and I am investigating that but the specific repeatable length of time seems weird. I am also investigating from the DB2/AS400 team if there is anything that could cause this at their end.
I will try updating the as400 library i am using to see if that helps, but I'd thought I would ask here in case anyone had any ideas.
Edit: Going through the massive java trace this error occurs too I will add to the original post:
as400: Connection TEST (1054077916) : Throwing exception, sqlState: 08003 reason: The connection does not exist. vendor code -99999.java.sql.SQLException: The connection does not exist.
The solution I have found through trial and error is to add the following to the jdbc:as400 database connection string:
socket timeout=20000;login timeout=20000;thread used=false;
I still get the SQLException, but it no longer hangs for 16 minutes, it only hangs for 20 seconds as specified in the connection string.
The cause is still unknown but at least it is a partial solution for now.
Using Jruby 9.17.3 with gem 'activerecord-jdbc-adapter', '1.3.16', and using the jt400.jar version 9.4.
I have a very strange issue when connecting to AS400. Very frequently there is a 16min delay when connection, it doesn't happen all the time but it happens a good proportion of the time and I have no explanation. The delay is always the same -> 16 mins 19 seconds every time!
Why does it wait this length of time and what could be causing this?
database.yml:
Here is the trace from the logs:
Wed Jul 17 11:02:14:059 as400: Driver AS/400 Toolbox for Java JDBC Driver (902660597) : connect called with URL: jdbc:as400://TEST.COMPANY.COM:449/BLAH;secure=true;trace=true;.
Wed Jul 17 11:02:14:058 UTC 2019 as400: Connection TEST (1054077916) : Throwing exception, sqlState: 08003 reason: The connection does not exist. vendor code -99999.java.sql.SQLException: The connection does not exist.
Wed Jul 17 11:02:14:057 UTC 2019 as400: Connection TEST (189823753) : Throwing exception. Actual exception: Communication link failure. (Connection timed out (Read failed)) sqlState: 08S01 vendor code -99999.java.sql.SQLException: Communication link failure. (Connection timed out (Read failed)) 2019-07-17 11:02:14.055 | Wed Jul 17 15:02:14:054 UTC 2019 as400: Connection TEST (189823753) : Throwing exception. Original exception: .java.net.SocketException: Connection timed out (Read failed)
2019-07-17 10:45:51.260 | Wed Jul 17 14:45:51:259 UTC 2019 as400@set..: Statement STMT0001 (1432999914) : Escape processing = "false".
2019-07-17 10:45:51.260 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Result set concurrency = "1007". 2019-07-17 10:45:51.260 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Behavior Override = "0". 2019-07-17 10:45:51.260 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Result set holdability = "1". 2019-07-17 10:45:51.260 | Wed Jul 17 14:45:51:259 UTC 2019 as400: Statement STMT0001 (1432999914) : Executing SQL Statement -->[SELECT 1 FROM sysibm.tables FETCH FIRST 1 ROWS ONLY]. 2019-07-17 10:45:51.260 | Wed Jul 17 14:45:51:259 UTC 2019 as400: Statement STMT0001 (1432999914) : Data to correlate statement with cursor Cursor CRSR0001 (818761950) . 2019-07-17 10:45:51.260 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Result set type = "1003". 2019-07-17 10:45:51.260 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Query timeout = "0". 2019-07-17 10:45:51.259 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Max field size = "0". 2019-07-17 10:45:51.259 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Fetch direction = "1000". 2019-07-17 10:45:51.259 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Escape processing = "true". 2019-07-17 10:45:51.259 | Wed Jul 17 14:45:51:259 UTC 2019 as400: Statement STMT0001 (1432999914) open. Parent: Connection TEST (189823753) . 2019-07-17 10:45:51.259 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Fetch size = "0". 2019-07-17 10:45:51.259 | Wed Jul 17 14:45:51:259 UTC 2019 as400@AS400JDBCStatement.init: Statement STMT0001 (1432999914) : Max rows = "0".