openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.58k forks source link

[Homematic] Openhab loses connection to CCU after upgrade to OH2.2.0 #3090

Closed TMGFlyer closed 6 years ago

TMGFlyer commented 6 years ago

I'm running Homematic CCU1. Operation with OH2.1 on RaspPi (Raspbian) works fine, but after upgrade to OH2.2.0 the connection between OH and the CCU gets lost about all 30 seconds. Most of the time the connection resumes automatically, but after a while the connection gets lost completly. In such case the Homematic ended up in an undefined status.

Expected Behavior

Stable connection between OH (OH2.2.0) and CCU1 as it worked with OH2.1.0

Current Behavior

When running Homematic CCU, then starting OH2.2.0, the connection between both seems to be stable during setup phase. Some while after entering normal operation the Log-File shows e.g. 2018-01-06 22:39:33.599 [WARN ] [ommunicator.AbstractHomematicGateway] - Connection lost on gateway 'bb33c5fa' Shortly later the connection is resumed automatically: 2018-01-06 22:39:39.887 [INFO ] [ommunicator.AbstractHomematicGateway] - Connection resumed on gateway 'bb33c5fa' This behaviour occurs again after 20 to 40 seconds again and again.

After some minutes (sometimes up to more than an hour) the connection between both systems brakes down completely. All other functions in OH are still running without any degradation. The Homematic CCU ends up in an undefined state, i.e. sometimes it is still accessable via the web interface, sometimes the CCU has to be reset completly.

Steps to Reproduce (for Bugs)

All other bindings and all items have been removed so that only the pure binding was established. --> no change

Your Environment

macmeck commented 6 years ago

I face the exact same issue on my system, also with a CCU1. This issue makes the whole Homematic setup unusable and going back to openHAB 2.1 ist just a pain...

Also, this has been mentioned in the community by other users before (https://community.openhab.org/t/homematic-binding-bridge-offline-communication-error/36840) so it seems to be quite reproducable on openHAB 2.2

FrankHer commented 6 years ago

Same issue here: System: RPi3 (Raspian, Jessie), Openhabian Image, Openhab 2.1, Homegear 0.7.8-1314 with Homematic things is working fine for weeks. Updated only Openhab to Version 2.2. System is running a few hours then crashes. Openhab + Homegear services are active. Openhab.log shows: Connection lost on gateway 'ccu' Connection resumed on gateway 'ccu' a few times, then crashes total. I cannot access Homematic things over 'Homematic Konfigurator'. Have to restart the system. Any ideas???

Andreas-Fink commented 6 years ago

Here the same Problems but not with a CCU1 but with Homegear 0.7.10-1397. In the most cases Homegear stays stable but shows up massive

RPC method calls in the log: RPC Server (Port 2001): Info: Client number 9197 is calling RPC method: getParamsetDescription (2) Parameters:

RPC Server (Port 2001): Info: Client number 9197 is calling RPC method: getDeviceInfo (2) Parameters:

Under some circumstances as well Homegear seams to struggle under the massive load of the requests.

If I can help to narrow down the problem, do not hesitate to contact me directly.

FrankHer commented 6 years ago

See also discussion in German on: https://forum.homegear.eu/t/probleme-nach-update-auf-openhab22/1805

TMGFlyer commented 6 years ago

I checked the link FrankHer posted and followed the advice. Unfortunatelly, it did not fix the problem on my side. Does anybody else have an idea?

ne0h commented 6 years ago

I fixed the error by setting "Gateway Type" in the binding settings from "Auto" to "CCU".

TMGFlyer commented 6 years ago

I have set the "Gateway Type" to "CCU" already very early during my investigations. This did not change anything

ne0h commented 6 years ago

You are right.. the connection is broken again. I am sorry!

Gugiman commented 6 years ago

Same here After upgrading to openhab 2.2 everything works fine. But then for some reasons all my things were gone. So I started to reconfiure all but I can't get the bridge online. First I thought the Firmware update of the CCU2 causes the problem. But after factory reset openhab still says no...

macmeck commented 6 years ago

Same here

@Gugiman, this might not be exactly the same issue. This issue describes the behaviour that the binding loses the connection to the CCU, the CCU runs into total craziness and the Things end up in an offline state. But the Things are not being removed from the configuration.

Gugiman commented 6 years ago

Thats right but things being removed is not my main point. Initializing of a new bridge fails instead of updateing to openhab 2.2 with an exisiting and functioning bridge from 2.1 which seems to work.

Maybe this helps fixing any bug

macmeck commented 6 years ago

@gerrieg, you are the maintainer of this binding, correct? Have you seen this behaviour before? Anything we can do in terms of debugging to bring this annoying issue forward?

gerrieg commented 6 years ago

First of all, the 2.2 version has a new connection tracker. The binding sends a ping and expects a pong on it. This works fine with my CCU2 and Raspberrymatic. For Homegear users, there is already a fix in the latest Homegear 0.8 snapshot. In the case of the CCU1 I can not say anything because I have none and can not test. The ping/pong behaviour should be the same based on the API description. Maybe someone can do a trace and try to analyse the ping/pongs? Another option of course is to switch to Raspberrymatic. Since version 20180120 it works flawlessly for me. A Rpi3 compared to a CCU1 is lightning speed.

macmeck commented 6 years ago

How would you perform a trace, what kind of information do you need?

gerrieg commented 6 years ago

See the docs for a trace: https://docs.openhab.org/addons/bindings/homematic/readme.html#debugging-and-tracing

You can send me a full trace log by mail if you like. But i can check it in one week at the eariest, because i am on vacation.

Gugiman commented 6 years ago

After setting the Callback Network Address to the remote IP address (192.168.[...] not 127.0.0.1) of my openhab instance everything is doing fine. Maybe it's the auto-discovery of the callback address which causes problems?

For now I haven't any disconnects since one week.

TMGFlyer commented 6 years ago

Finally I found the time to install the OH2.2 version again and to generate a log of the effect. I'm not sure, if this allows you to isolate the issue. If there are any questions, please let me know.

openhab.log

gerrieg commented 6 years ago

Thanks for the trace log. I see two things here.

  1. Everything looks good up to one point, where the CCU1 does not send a answer to the connection tracker anymore. I think at this time the CCU1 is overloaded. The connection tracker restarts the whole init procedure.

  2. There are many newDevice events. Normally they are ignored at initialization, but since they are being sent so late, the binding handles all these events which results in many API calls to the CCU1

So what can we do.

  1. Please try to configure a higher timeout value at the bridge, default is 15 seconds. So that the connection tracker listenes longer for a answer from the CCU1. Set it to 60 or even 120.

  2. Here i can implement a delay for the CCU1 to ignore the newDevice events at startup for a specified time.

But please try the timeout first and let me know.

TMGFlyer commented 6 years ago

I set the timeout value to 120 seconds, but unfortunately, no difference. Attached see the latest log file based on the 120 seconds timeout

grafik

openhab.log

gerrieg commented 6 years ago

I added a startup delay for newDevice events now: https://github.com/openhab/openhab2-addons/pull/3283

please try the 2.3 snapshot once the PR has been merged

Andreas-Fink commented 6 years ago

I‘m using Homegear 0.8.0-2001 (HG) and OpenHab 2.3.0~20180303002042-1 (OH) and see the same error that everything is working as expected and sometime after 1 or 2 hour and sometimes after days OH lost the connection to HG and reconnects, lost the connection again and reconnects again, ...

In the HG log I see a lot status requests from OH. At this point interacting with Homematic Devices still works but with an growing delay (up to minutes). And after some time HG crashes with the commend that more than 1000 packets at in the queue and that the packet processing is to slow.

When I restart OH while the reaction is delayed but still working, then everything is working again. When I missed this windows and HG is arguing about „over 1000 packets in queue“ the both OH and HG needs to be restarted to get all up and running again.

My environment is running on VMware ESXi based on a MacMini with a Intel Core i5 and 8GB Ram. OH and HG are have both their own VM with Debian 8, 2 vCPUs and 2GB Ram.

macmeck commented 6 years ago

PR #3283 has not been reviewed and is not yet merged. I guess we need to wait for that...

martinvw commented 6 years ago

Thanks for the hint, will take a look now :-)

Andreas-Fink commented 6 years ago

@macmeck I‘m aware of that. I just wanted to clarify with my Post that the problem still exists for Homegear also (after the HG 0.8. x-Patch mentioned @gerrieg) for Homegear. But since some minutes the PR is merged. I will update my OH Installation today evening when I‘m at home. If I can help at some point with this (Trace-Logs or anything else) just let me know.

Andreas-Fink commented 6 years ago

Thanks to @martinvw for the merge, and of course @gerrieg for the PR and the update itself!

Andreas-Fink commented 6 years ago

This time it stays very long up and running. But finally it crashed again. Homegear writes over and over again: „RPC Server (Port 2001): Error: There are too many clients connected to me. Closing incoming connection. You can increase the number of allowed connections in main.conf.“

TMGFlyer commented 6 years ago

I checked the "2.3.0. 201803101457" version. The system ran now some while longer, but after some hours it failed again. Please let me know, if you need any further tests or information

macmeck commented 6 years ago

bump any progress here?

TMGFlyer commented 6 years ago

no progress here. I still have to run OH2.1 to avoid a break down

gerrieg commented 6 years ago

Today I had some time to look through the code again to see if any connection is not closed. The only thing I found is a option to reuse the socket address and port that wasn't set. Maybe that will help you. Otherwise I can't think of anything else, I can't reproduce it in my environment.

PR: https://github.com/openhab/openhab2-addons/pull/3461

ifs-net commented 6 years ago

I face the same problems (oh2 & homegear). Is there a chance to get rid of this problem?

gerrieg commented 6 years ago

Any new here? Has anyone tried the current snapshot with my latest changes?

macmeck commented 6 years ago

I will give it a try later today...

macmeck commented 6 years ago

Good news so far, today's test run using the latest version showed 4 hours without any crashes of the homematic. Nevertheless, the log shows up that the connection is being lost and resumed every 15 seconds:

2018-04-18 00:31:31.307 [WARN ] [ommunicator.AbstractHomematicGateway] - Connection lost on gateway 'homematic-homie' 2018-04-18 00:31:31.442 [INFO ] [ommunicator.AbstractHomematicGateway] - Connection resumed on gateway 'homematic-homie' 2018-04-18 00:31:46.528 [WARN ] [ommunicator.AbstractHomematicGateway] - Connection lost on gateway 'homematic-homie' 2018-04-18 00:31:46.648 [INFO ] [ommunicator.AbstractHomematicGateway] - Connection resumed on gateway 'homematic-homie' 2018-04-18 00:32:01.649 [WARN ] [ommunicator.AbstractHomematicGateway] - Connection lost on gateway 'homematic-homie' 2018-04-18 00:32:01.869 [INFO ] [ommunicator.AbstractHomematicGateway] - Connection resumed on gateway 'homematic-homie' 2018-04-18 00:32:17.120 [WARN ] [ommunicator.AbstractHomematicGateway] - Connection lost on gateway 'homematic-homie' 2018-04-18 00:32:17.261 [INFO ] [ommunicator.AbstractHomematicGateway] - Connection resumed on gateway 'homematic-homie' 2018-04-18 00:32:32.322 [WARN ] [ommunicator.AbstractHomematicGateway] - Connection lost on gateway 'homematic-homie' 2018-04-18 00:32:32.448 [INFO ] [ommunicator.AbstractHomematicGateway] - Connection resumed on gateway 'homematic-homie' 2018-04-18 00:32:47.614 [WARN ] [ommunicator.AbstractHomematicGateway] - Connection lost on gateway 'homematic-homie' 2018-04-18 00:32:47.768 [INFO ] [ommunicator.AbstractHomematicGateway] - Connection resumed on gateway 'homematic-homie'

I will let this test run for some more time and gonna keep you posted.

macmeck commented 6 years ago

I think we're making progress here, still no crashes :-)

Concerning the log messages, can we somehow tackle down if the binding really loses the connection every 15 secs or if this is just some flaw in the logs? The log files are flooded by this messages and growing very fast...

gerrieg commented 6 years ago

@macmeck thanks for testing! Good news, i finally found the bug in the CCU1 connection tracker 😃 see PR https://github.com/openhab/openhab2-addons/pull/3498

TMGFlyer commented 6 years ago

I just started the test here. I have the same observation as macmeck. It's running, but looses connection every 15 seconds. I will keep it running for the night to provide the further experience tomorrow

gerrieg commented 6 years ago

@TMGFlyer my fix has been merged, try the next snapshot tomorrow

macmeck commented 6 years ago

There is no snapshot available yet, that includes the latest fix, right? At least, I haven't found one...

gerrieg commented 6 years ago

That's right, latest snapshot is 1259 from 18.04. Maybe there is a new one tomorrow.

macmeck commented 6 years ago

Okay, with the latest snapshot, the messages are gone... sweet thing! I will continue testing from here.

ifs-net commented 6 years ago

First thank you for your work with openhab and with the homegear binding. Is there an information available when the next official release (that will contain these changes) will be published? I just tried to downgrade to 2.1.0 but this failed due to other problems and I'm now switching back to 2.2. But at the moment openhab2 is hardly usable because all homematic / homegear devices do not work reliable..

TMGFlyer commented 6 years ago

Yesterday, I installed the latest snapshot OH2.3.0 Build#1264. I have it now running for 24 hrs without any hick-up, although I have set the timeout back to 15 sec. At a first glance, also the LOG looks fine. Continuous testing will be performed over the weekend.

Looks like you caught it, perfect job, thanks.