steveohara / j2mod

Enhanced Modbus library implemented in the Java programming language
Apache License 2.0
263 stars 112 forks source link

TCP Connection idle timeout #108

Closed akochubey2004 closed 3 years ago

akochubey2004 commented 3 years ago

Hi Steve,

First, thank you very much for working on j2mod. I've been using original jamod for quite a while in my projects and I was always worried about its error handling / etc being a bit "childish". j2mod code looks much better.

Still, there is an issue. My Modbus/TCP slaves sometimes get unresponsive - it will accept incoming connections but will never respond anything. I tracked it to be the cases where connection suddenly drops. For examples, master devices being abruptly powered off of a LAN cable being disconnected.

TCP connection will not detect any timeouts if it is not actively sending any data. So, if LAN cable is disconnected between requests, ModbusTCPTransport will simply hang in Socket.read(), waiting for data which will never arrive and occupying a thread in a thread pool. Eventually thread pool becomes depleted and slave doesn't process any more requests.

There is a couple of bugs in JDK (https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8075484 and relates), which are "fixed" but are still reproducable on some machines (I've been using 8u201 on CentOS 6 and it is still there).

Please review this pull request. I've added a watchdog timer for TCP connection, which closes the connection if there is no activity within a given timeframe. This works as a safety net - if master dies, a connection will eventually be closed and release the thread in thread pool.

Please feel free to request any changes, if proposed changes doesn't suite your design (it seems to me that TCPConnectionHandler is a right place for the watchdog, but I'm not sure), I will do them.

Sincerely, Anatoly

P.S. I couldn't find a documentation on building / running unit tests, please would you point me there, if it exists. Some tests (TestModbusTCPExternalRead, for example) seems to rely on some external tool (?) and I was unable to run them.

steveohara commented 3 years ago

Hi Anatoly, thanks, I'll take a look and come back to you.

For testing, all the 3rd-party tools required are included in the package and are installed when you run the tests. The only exception to this is the serial tests which can only be run on Windows so those are skipped if you run the tests on anything else.

steveohara commented 3 years ago

I've tweaked you changes a little and built them into a 2.7.0-SNAPSHOT from the development branch so please download and see if it looks good and I will make a master release

akochubey2004 commented 3 years ago

Hi Steve, thank you,

Your tweaks are all fine. Actually, I also had doubts about lastActivityTs / lastActivityTimestamp naming :-)

Wrt tests: I'm working on Linux (CentOS 6) and I had problem running tests based on "modpoll" utility. The problem seems to be in localhost/127.0.0.1 resolve:

13:01:51 anatoly@anatoly modpoll-1597053427810> ./modpoll -m tcp -p 2502 -a 15 -r 1 -t 0 -c 1 localhost 0 modpoll 3.4 - FieldTalk(tm) Modbus(R) Master Simulator Copyright (c) 2002-2013 proconX Pty Ltd Visit http://www.modbusdriver.com for Modbus libraries and tools.

Protocol configuration: MODBUS/TCP Slave configuration...: address = 15, start reference = 1, count = 1 Communication.........: localhost, port 2502, t/o 1.00 s, poll rate 1000 ms Data type.............: discrete output (coil)

Can't reach server/slave! Check TCP/IP and firewall settings. 13:02:06 anatoly@anatoly modpoll-1597053427810> ./modpoll -m tcp -p 2502 -a 15 -r 1 -t 0 -c 1 127.0.0.1 0 modpoll 3.4 - FieldTalk(tm) Modbus(R) Master Simulator Copyright (c) 2002-2013 proconX Pty Ltd Visit http://www.modbusdriver.com for Modbus libraries and tools.

Protocol configuration: MODBUS/TCP Slave configuration...: address = 15, start reference = 1, count = 1 Communication.........: 127.0.0.1, port 2502, t/o 1.00 s, poll rate 1000 ms Data type.............: discrete output (coil)

Written 1 reference.

So, it works with 127.0.0.1, but doesn't work with "localhost".

Slave is listening "all interfaces", as expected:

13:01:45 anatoly@anatoly modpoll-1597053427810> netstat -apn | grep 2502 tcp 0 0 :::2502 :::* LISTEN 26500/java

and "locahost" resolves right: 13:02:14 anatoly@anatoly modpoll-1597053427810> ping localhost PING localhost (127.0.0.1) 56(84) bytes of data. 64 bytes from localhost (127.0.0.1): icmp_seq=1 ttl=64 time=0.027 ms

I will look on it further, when I have more time. Should be some local configuration issue, I think.

steveohara commented 3 years ago

Swapped to using 127.0.0.1 in the tests and released 2.7.0