openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.9k stars 3.59k forks source link

[dsmr] Exceptions in rfc2217 connector brings down the bridge until manual action is made to reload it #15529

Open nagisa opened 1 year ago

nagisa commented 1 year ago

Expected Behavior

Random connectivity-related exceptions should be recovered from automatically.

Current Behavior

I’ve noticed data stopped being updated from my DSMR electricity meter. Investigating revealed that the DSMR reader was reachable and providing data to anybody connecting over telnet just fine. However, OpenHAB’s UI was showing that the binding was "down". I found the following (singular) error in my system logs (i.e. this was output to openhab’s stderr):

Aug 31 23:03:50
Exception in thread "OH-binding-dsmr:dsmrBridge:slimmelezer" java.lang.IllegalStateException: port is closed
    at gnu.io.rfc2217.TelnetSerialPort$State.checkNotClosed(TelnetSerialPort.java:183)
    at gnu.io.rfc2217.TelnetSerialPort.setRTS(TelnetSerialPort.java:563)
    at org.openhab.core.io.transport.serial.rxtx.RxTxSerialPort.setRTS(RxTxSerialPort.java:119)
    at org.openhab.binding.dsmr.internal.device.connector.DSMRSerialConnector.close(DSMRSerialConnector.java:209)
    at org.openhab.binding.dsmr.internal.device.DSMRSerialAutoDevice.stop(DSMRSerialAutoDevice.java:171)
    at org.openhab.binding.dsmr.internal.device.DSMRDeviceRunnable.run(DSMRDeviceRunnable.java:95)
    at java.base/java.lang.Thread.run(Thread.java:833)

and the following errors in userdata/logs/openhab.log:

2023-08-31 23:03:50.565 [WARN ] [r.internal.device.DSMRDeviceRunnable] - DSMRDeviceRunnable stopped with a RuntimeException
java.lang.IllegalStateException: Unable to establish remote connection to serial port rfc2217://192.168.1.99:23
    at org.openhab.core.io.transport.serial.rxtx.rfc2217.internal.SerialPortIdentifierImpl.open(SerialPortIdentifierImpl.java:60) ~[?:?]
    at org.openhab.binding.dsmr.internal.device.connector.DSMRSerialConnector.open(DSMRSerialConnector.java:138) ~[?:?]
    at org.openhab.binding.dsmr.internal.device.connector.DSMRSerialConnector.open(DSMRSerialConnector.java:120) ~[?:?]
    at org.openhab.binding.dsmr.internal.device.connector.DSMRSerialConnector.restart(DSMRSerialConnector.java:263) ~[?:?]
    at org.openhab.binding.dsmr.internal.device.DSMRSerialAutoDevice.restart(DSMRSerialAutoDevice.java:165) ~[?:?]
    at org.openhab.binding.dsmr.internal.device.DSMRDeviceRunnable.run(DSMRDeviceRunnable.java:84) ~[?:?]
    at java.lang.Thread.run(Thread.java:833) ~[?:?]
Caused by: java.net.SocketTimeoutException: Connect timed out
    at sun.nio.ch.NioSocketImpl.timedFinishConnect(NioSocketImpl.java:546) ~[?:?]
    at sun.nio.ch.NioSocketImpl.connect(NioSocketImpl.java:597) ~[?:?]
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:327) ~[?:?]
    at java.net.Socket.connect(Socket.java:633) ~[?:?]
    at org.apache.commons.net.SocketClient._connect(SocketClient.java:132) ~[?:?]
    at org.apache.commons.net.SocketClient.connect(SocketClient.java:268) ~[?:?]
    at org.openhab.core.io.transport.serial.rxtx.rfc2217.internal.SerialPortIdentifierImpl.open(SerialPortIdentifierImpl.java:57) ~[?:?]
    ... 6 more

This seems to be a prime example of roughly a spurious/transient error which should be recovered from automatically.

Possible Solution

:shrug:

Steps to Reproduce (for Bugs)

  1. I have no clue. The exception log suggests that it should be fairly straightforward to reproduce by having a mock telnet server that closes a socket after some time or something like that.

My thing is defined as such:

Bridge dsmr:dsmrBridge:slimmelezer [serialPort="rfc2217://192.168.1.99:23"] {
    Things:
        electricity_v5_0 electricityV5 [channel=0]
}

Context

I’m not aware of any network interruptions but even if there were any, I believe the expectation is that openhab generally reconnects when the connectivity restored.

Your Environment

lsiepel commented 2 months ago

It might already be enough to catch the exception so that the thread is not killed. I created a new jar for openHAB 4.2.0 + download here: https://1drv.ms/u/s!AnMcxmvEeupwj4cVrMVDozaSqUKlbA?e=higGnK

Let me know your test results. If the issue occurs, you should have a debug line showing : "Failed communicating, probably time out" + stacktrace. So it is usefull to have your binding logging into debug.

  1. Upgrade to openhab 4.2.1+
  2. Uninstall current binding
  3. drop the jar in your addon folder
  4. set the binding into debug log mode
  5. Scan your log for the givven phrase and report your results.