openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.88k stars 3.59k forks source link

[Homematic] Failure restarting BIN-RPC server, status not updated #2352

Closed vbier closed 6 years ago

vbier commented 7 years ago

Every hour, the binding tries to restart the BIN-RPC server. From time to time, this fails with a BindException, presumably because the port is already in use (even though it was used by the binding without problems before the restarting the server).

Whenever the restart fails, the status of the bridge stays ONLINE, even though nothing works any more. The ReconnectThread simply logs a debug message and tries to connect again after the reconnect interval has expired. This attempt then succeeds and everything works again.

This is the stack trace of the exception when restarting the server fails:

2017-05-30 04:12:47.367 [DEBUG] [nal.communicator.server.BinRpcServer] - Stopping BIN-RPC server
2017-05-30 04:12:47.369 [DEBUG] [nal.communicator.server.BinRpcServer] - Initializing BIN-RPC server at port 9126
2017-05-30 04:12:47.373 [DEBUG] [ommunicator.AbstractHomematicGateway] - Address already in use (Bind failed)
java.net.BindException: Address already in use (Bind failed)
        at java.net.PlainSocketImpl.socketBind(Native Method)[:1.8.0_131]
        at java.net.AbstractPlainSocketImpl.bind(AbstractPlainSocketImpl.java:387)[:1.8.0_131]
        at java.net.ServerSocket.bind(ServerSocket.java:375)[:1.8.0_131]
        at java.net.ServerSocket.bind(ServerSocket.java:329)[:1.8.0_131]
        at org.openhab.binding.homematic.internal.communicator.server.BinRpcNetworkService.<init>(BinRpcNetworkService.java:46)[195:org.openhab.binding.homematic:2.1.0.201705151816]
        at org.openhab.binding.homematic.internal.communicator.server.BinRpcServer.start(BinRpcServer.java:42)[195:org.openhab.binding.homematic:2.1.0.201705151816]
        at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway.startServers(AbstractHomematicGateway.java:228)[195:org.openhab.binding.homematic:2.1.0.201705151816]
        at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway.access$4(AbstractHomematicGateway.java:222)[195:org.openhab.binding.homematic:2.1.0.201705151816]
        at org.openhab.binding.homematic.internal.communicator.AbstractHomematicGateway$ReconnectThread.run(AbstractHomematicGateway.java:903)[195:org.openhab.binding.homematic:2.1.0.201705151816]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)[:1.8.0_131]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)[:1.8.0_131]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)[:1.8.0_131]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)[:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)[:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)[:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748)[:1.8.0_131]

Expected behavior would be that the restarting of the server works (the stop code should probably wait until resources have been freed). If this is not possible, than the binding should try to start the BIN-RPC server before the reconnect interval has expired (if the connection is gone, this is not exactly a reconnect scenario any more). At least the status should be set to OFFLINE so the user is aware of the fact that nothing works any more.

Openhab is running on a Raspberry Pi 3 with openhabian, openhab version is 2.1.0~2017-05-15.

MHerbst commented 7 years ago

I am also running the Homematic binding on a Rpi 3 and never had this problem. But I am using the release version of openHAB2. As far as I can see there have been some refactorings in the RPC server implementation but the reconnect itself was not modified and at the moment I have no idea why this error occurs. According to the coding the port should be reused if it has not been freed completely. Maybe there is a problem in the socket implementation on the Rpi. In this case it may help to wait some (milli)seconds and try to connect again. I agree with you that it would be correct to set the state to OFFLINE if the reconnect fails.

vbier commented 7 years ago

I have added a delay of 500ms between the stopping of the server and the subsequent start. This seems to have fixed my problems, openHAB is now running for a few days with a restart interval of 60s without problems. This seems to confirm my theory of resources that are not yet released on restarting the server...

MHerbst commented 7 years ago

Thanks for the test and feedback. In this case I think it would be the best to implement a retry loop that tries to reconnect several times and if it fails sets the state to OFFLINE.

vbier commented 6 years ago

Due to the fact that the bindings cyclic reconnect has been replaced by proper connection tracking this is no longer a problem for me.