[knx] NPE when bundle is stopped

kaikreuzer commented 4 years ago

Just found this in my log:

2020-08-10 22:22:48.141 [ERROR] [calimero.link.192.168.0.5:3671      ] - send error, closing link
java.lang.InterruptedException: null
    at java.lang.Object.wait(Native Method) ~[?:?]
    at tuwien.auto.calimero.knxnetip.ConnectionBase.waitForStateChange(ConnectionBase.java:541) ~[bundleFile:?]
    at tuwien.auto.calimero.knxnetip.ConnectionBase.send(ConnectionBase.java:242) ~[bundleFile:?]
    at tuwien.auto.calimero.knxnetip.KNXnetIPTunnel.send(KNXnetIPTunnel.java:177) ~[bundleFile:?]
    at tuwien.auto.calimero.link.KNXNetworkLinkIP.onSend(KNXNetworkLinkIP.java:264) [bundleFile:?]
    at tuwien.auto.calimero.link.AbstractLink.send(AbstractLink.java:332) [bundleFile:?]
    at tuwien.auto.calimero.link.KNXNetworkLinkIP.sendRequestWait(KNXNetworkLinkIP.java:243) [bundleFile:?]
    at tuwien.auto.calimero.mgmt.TransportLayerImpl.sendData(TransportLayerImpl.java:372) [bundleFile:?]
    at tuwien.auto.calimero.mgmt.ManagementClientImpl.send(ManagementClientImpl.java:797) [bundleFile:?]
    at tuwien.auto.calimero.mgmt.ManagementClientImpl.sendWait2(ManagementClientImpl.java:824) [bundleFile:?]
    at tuwien.auto.calimero.mgmt.ManagementClientImpl.readDeviceDesc(ManagementClientImpl.java:447) [bundleFile:?]
    at tuwien.auto.calimero.mgmt.ManagementProceduresImpl.isAddressOccupied(ManagementProceduresImpl.java:310) [bundleFile:?]
    at org.openhab.binding.knx.internal.client.AbstractKNXClient.isReachable(AbstractKNXClient.java:341) [bundleFile:?]
    at org.openhab.binding.knx.internal.handler.AbstractKNXThingHandler.pollDeviceStatus(AbstractKNXThingHandler.java:148) [bundleFile:?]
    at org.openhab.binding.knx.internal.handler.AbstractKNXThingHandler.lambda$1(AbstractKNXThingHandler.java:188) [bundleFile:?]
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
    at java.lang.Thread.run(Thread.java:834) [?:?]

Instead of this exception, the thread is expected to be terminated gracefully.

J-N-K commented 3 years ago

The reason is that AbstractKNXThingHandler.dispose() calls AbstractKNXThingHandler.detachFromClient() which has the following code

if (pollingJob != null) {
            pollingJob.cancel(true);
            pollingJob = null;
        }

The pollingJob calls pollDeviceStatus which is interrupted and causes the error above (there is an Object.wait in waitForStateChange and the InterruptedException from there is finally caught in KNXNetworkLinkIP. Probably changing to cancel(false) would avoid the exception but lead to a delayed disposal, depending on the timeout of the waitForStateChange-call.

holgerfriedrich commented 1 year ago

Still there with current implementation. Needs some special handling of interruption.

2022-12-11 22:17:21.307 [ERROR] [calimero.link./dev/ttyAMA0          ] - send error, closing link
java.lang.InterruptedException: null
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1261) ~[?:?]
        at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:317) ~[?:?]
        at tuwien.auto.calimero.serial.FT12Connection.send(FT12Connection.java:388) ~[?:?]
        at tuwien.auto.calimero.link.KNXNetworkLinkFT12.onSend(KNXNetworkLinkFT12.java:145) ~[?:?]
        at tuwien.auto.calimero.link.AbstractLink.send(AbstractLink.java:388) ~[?:?]
        at tuwien.auto.calimero.link.AbstractLink.sendRequestWait(AbstractLink.java:312) ~[?:?]
        at tuwien.auto.calimero.mgmt.TransportLayerImpl.sendDisconnect(TransportLayerImpl.java:604) ~[?:?]
        at tuwien.auto.calimero.mgmt.TransportLayerImpl.disconnectIndicate(TransportLayerImpl.java:585) ~[?:?]
        at tuwien.auto.calimero.mgmt.TransportLayerImpl.sendData(TransportLayerImpl.java:367) ~[?:?]
        at tuwien.auto.calimero.mgmt.ManagementClientImpl.send(ManagementClientImpl.java:1275) ~[?:?]
        at tuwien.auto.calimero.mgmt.ManagementClientImpl.send(ManagementClientImpl.java:1259) ~[?:?]
        at tuwien.auto.calimero.mgmt.ManagementClientImpl.sendWait2(ManagementClientImpl.java:1295) ~[?:?]
        at tuwien.auto.calimero.mgmt.ManagementClientImpl.readDeviceDesc(ManagementClientImpl.java:595) ~[?:?]
        at tuwien.auto.calimero.mgmt.ManagementProceduresImpl.isAddressOccupied(ManagementProceduresImpl.java:391) ~[?:?]
        at org.openhab.binding.knx.internal.client.AbstractKNXClient.isReachable(AbstractKNXClient.java:435) ~[?:?]
        at org.openhab.binding.knx.internal.handler.AbstractKNXThingHandler.pollDeviceStatus(AbstractKNXThingHandler.java:148) ~[?:?]
        at org.openhab.binding.knx.internal.handler.AbstractKNXThingHandler.lambda$1(AbstractKNXThingHandler.java:190) ~[?:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]

lsiepel commented 1 year ago

Not sure, but if i understand it right, the synchronized method call to AbstractKNXClient.isReachable from pollDeviceStatus is interrrupted. And throws an InterruptedException this exception is not catched within AbstractKNXThingHandler.pollDeviceStatus. So i think it might be fixed with adding the InterruptedException to the catch block at AbstractKNXThingHandler.pollDeviceStatus:165

Nevermind. That was not the case.

holgerfriedrich commented 1 year ago

The stacktrace is actually printed by Calimero library:

protected void onSend(final KNXAddress dst, final byte[] msg, final boolean waitForCon) throws KNXTimeoutException, KNXLinkClosedException
{
    try {
        ...
    } catch (KNXPortClosedException | InterruptedException e) {
        logger.error("send error, closing link", e);
        ...
        throw new KNXLinkClosedException("link closed, " + e.getMessage());
    }
}

We cannot catch this.

During the binding shutdown / disconnect / etc., several Futures are cancelled with Interruption. And this is quite crucial, as the resources need to be released (for example the number of tunnel connections on the IP interface may be limited, so we want to be sure the link is closed). It may be worth to issue a PR for Calimero to change the logging behavior (e.g. logging the Stacktrace only if ``logger.isTraceEnabled()```. Any suggestions?

lsiepel commented 1 year ago

If i understand it right, this is intentional behaviour. Just logging is more verbose then expected. Even without the stacktrace, i don't think it would do any good to log this at error level when the binding is perfectly capable of handling this situation. Could we in some how ignore/filter this calimero log messages by default ?

openhab / openhab-addons

[knx] NPE when bundle is stopped #8277