openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.9k stars 3.59k forks source link

[Loxone binding] communication freeze #8693

Closed Davidoff77test closed 3 years ago

Davidoff77test commented 4 years ago

Expected Behavior

Communication continue and loxone binding keeps read/write value from/to loxone

Current Behavior

After a few days loxone binding stops communication. In log you can see this: _2020-10-08 02:32:40.434 [DEBUG] [ding.loxone.internal.LxServerHandler] - [1] Sleeping for 32 seconds. 2020-10-08 02:33:12.373 [DEBUG] [.binding.loxone.internal.LxWebSocket] - [1] Sending unencrypted string: keepalive 2020-10-08 02:33:12.376 [DEBUG] [inding.loxone.internal.types.LxState] - State set (152A405E-022C-6A42-FFFFBB45C369C8A5,value) control (152A405E-022C-6A42-FFFFBB45C369C8A5,Aktuální příkon sítě) value=309.0 2020-10-08 02:33:12.422 [DEBUG] [ding.loxone.internal.LxServerHandler] - [1] Sleeping for 120 seconds. 2020-10-08 02:33:40.411 [DEBUG] [inding.loxone.internal.types.LxState] - State set (152A405E-022C-6A3C-FFFFBB45C369C8A5,value) control (152A405E-022C-6A3C-FFFFBB45C369C8A5,Aktuální příkon domu) value=329.0 2020-10-08 02:33:40.414 [DEBUG] [ding.loxone.internal.LxServerHandler] - [1] Sleeping for 92 seconds. 2020-10-08 02:33:42.396 [DEBUG] [inding.loxone.internal.types.LxState] - State set (152A405E-022C-6A42-FFFFBB45C369C8A5,value) control (152A405E-022C-6A42-FFFFBB45C369C8A5,Aktuální příkon sítě) value=329.0 2020-10-08 02:33:42.397 [DEBUG] [ding.loxone.internal.LxServerHandler] - [1] Sleeping for 90 seconds. 2020-10-08 02:33:43.615 [DEBUG] [inding.loxone.internal.types.LxState] - State set (151DD295-005A-712A-FFFF8A4870419E06,value) control (151DD295-005A-712A-FFFF8A4870419E06,Venkovní teplota) value=11.100000000000001 2020-10-08 02:33:43.616 [DEBUG] [ding.loxone.internal.LxServerHandler] - [1] Sleeping for 89 seconds. 2020-10-08 02:33:43.806 [DEBUG] [inding.loxone.internal.types.LxState] - State set (151DD295-005A-712E-FFFF8A4870419E06,value) control (151DD295-005A-712E-FFFF8A4870419E06,Teplota v 13 nádrže) value=43.1 2020-10-08 02:33:43.808 [DEBUG] [ding.loxone.internal.LxServerHandler] - [1] Sleeping for 89 seconds. 2020-10-08 02:34:12.464 [DEBUG] [inding.loxone.internal.types.LxState] - State set (152A405E-022C-6A42-FFFFBB45C369C8A5,value) control (152A405E-022C-6A42-FFFFBB45C369C8A5,Aktuální příkon sítě) value=299.0 2020-10-08 02:34:12.466 [DEBUG] [ding.loxone.internal.LxServerHandler] - [1] Sleeping for 60 seconds. 2020-10-08 02:34:40.500 [DEBUG] [inding.loxone.internal.types.LxState] - State set (152A405E-022C-6A3C-FFFFBB45C369C8A5,value) control (152A405E-022C-6A3C-FFFFBB45C369C8A5,Aktuální příkon domu) value=299.0 2020-10-08 02:34:40.502 [DEBUG] [ding.loxone.internal.LxServerHandler] - [1] Sleeping for 32 seconds.

But after this log there is no log from loxone binding (even there is no other keepalive command)

Steps to Reproduce (for Bugs)

  1. Run loxone binding
  2. Wait a few days without restart of openhabian

Your Environment

Openhab 2.5.9 - stable running as openhabian on rPi4 4GB RAM

ppieczul commented 4 years ago

Hi @Davidoff77test. Since I am running this binding on my openhabian for months without restart, this is not straight to reproduce. What about other system components? Is there anything else in the logs? Do other bindings function? What is the status of the free memory?

Davidoff77test commented 4 years ago

Hi, every other binding runs ok, but the loxone stops. Ram is ok, there is more than 2GB ram free. There is also no logs from loxone binding since the time at log I send until I restart service. I don't know if helps to change log from debug to trace, but there isn't every error which is strange.

Milhauzz commented 4 years ago

Hello hello, same issue here and sorry @ppieczul ppieczul for emailing you this morning didn't know this tracker. Have Openbian 1.6 (openHAB 2.5.9-1), Loxone server 1st gen 11.1.9.14 version. Communication is only one way, from miniserver to OH, but not updating items. In other way, nothing works, only after reboot (stopping and starting binding only has no effect) - or when procced a thing deletion and adding new miniserver.

What is crazy, that debug log show nothing. Maybe I missed something and will look deeper to logs, but so not have anything more right now as Davidoff has.

Milhauzz commented 4 years ago

Ha, found more than one interesting things, look at the top of the banner. Unable to parase IP. Wifi is off, running dhcp config, IP in OH settings is set:

############################################################################### ############### openHABianDevice ############################################ ###############################################################################

Ip = Unable to parse ip . Please debug.

Release = Raspbian GNU/Linux 10 (buster)

Kernel = Linux 5.4.51-v7l+

Platform = Raspberry Pi 4 Model B Rev 1.1

Uptime = 0 day(s). 0:6:54

CPU Usage = 1.52% avg over 4 cpu(s) (4 core(s) x 1 socket(s))

CPU Load = 1m: 0.11, 5m: 0.52, 15m: 0.36

Memory = Free: 0.12GB (14%), Used: 0.76GB (86%), Total: 0.88GB

Swap = Free: 1.66GB (100%), Used: 0.00GB (0%), Total: 1.66GB

Root = Free: 2.20GB (33%), Used: 4.47GB (67%), Total: 7.00GB

Updates = 0 apt updates available.

Sessions = 1 session(s)

Processes = 125 running processes of 32768 maximum processes

###############################################################################

And as I can see, Systeminfo binding is not working. So I presume, Loxone binding is ok and there is major mistake in OH release.

Davidoff77test commented 4 years ago

There is everything right on my Openhabian

############################################################################### ############### openhab ##################################################### ###############################################################################

Ip = 192.168.1.15

Release = Raspbian GNU/Linux 10 (buster)

Kernel = Linux 5.4.51-v7l+

Platform = Raspberry Pi 4 Model B Rev 1.1

Uptime = 4 day(s). 22:49:10

CPU Usage = 3.78% avg over 4 cpu(s) (4 core(s) x 1 socket(s))

CPU Load = 1m: 0.27, 5m: 0.28, 15m: 0.31

Memory = Free: 2.77GB (73%), Used: 1.01GB (27%), Total: 3.78GB

Swap = Free: 0.09GB (100%), Used: 0.00GB (0%), Total: 0.09GB

Root = Free: 21.49GB (77%), Used: 6.29GB (23%), Total: 28.99GB

Updates = 0 apt updates available.

Sessions = 1 session(s)

Processes = 118 running processes of 32768 maximum processes

###############################################################################

Milhauzz commented 4 years ago

As I rember I found some unstability of Lox binding and new openhabian version about one month ago, can't remeber what I exactly did, but sometimes it was not working properly. Maybe there is some dependency on all installed bindings and current OH version. But I'm not prorammer or debugger, only wintel scripting guy :)

So I will do update/upgrade within hour to latest version, then will try to "downgrade", for example to OpenHabian 1.5.

ollietb commented 4 years ago

I updated to the latest version of OH 2 weeks ago and my loxone->OH connection has just dropped. It was running on the previous version with no problems for about 1 year. I can't see anything obvious in the logs.

ollietb commented 4 years ago

Restarting my openhab device seems to reconnect

Davidoff77test commented 4 years ago

Hi, loxone binding still freezing. Last after a day. Othes addon works correctly. Just restart openhab helps, but it isn't solution because I control for example heating in bathroom via loxone bathrooom temperature.

Milhauzz commented 4 years ago

Hello, I did almost a month ago a clean instalation from OpenHabian 1.5 and then update instead of 1.6 clean instalation where Loxone binding was dropping connection. Now it seems that everything is ok and there is scheduled restart every morning 3AM as a maintenance.

ollietb commented 3 years ago

It is still freezing for me too. I've created a cronjob to restart openhabian every monday, hopefully that will fix it.

Davidoff77test commented 3 years ago

Same for me. But cronjob? It's not acceptable workaround.

ppieczul commented 3 years ago

When the communication freezes, what is the status of Loxone thing in paper UI and over CLI?

Milhauzz commented 3 years ago

As I remember it's simply offline. Debug log don't show anything extra so there is unfortunately nothing to debug. Is there any limitation for binding such a number of items or something like that? I had them plenty, now I'm trying to simplify some stuff to move all possible rules to Loxone only and it seems that with cron restart every morning it's stable (the offline happened only one time). Pawel - is there something what could help you more? Some log / settings / versions of Loxone stuff and more? Thank you.

ppieczul commented 3 years ago

There is nothing specific to the binding, but maybe we are hitting some corner case in the framework. Next to offline status there should be some text reason explaining why it went offline, if it was initiated by the binding. Also any normal termination of the binding or it going offline has some accompanying logs. If the logs just stop, it looks like maybe something happened to the binding thread. Here is what you can do: log into openhab console (default password is habopen): openhab-cli console run command: threads --list >list.txt run command: threads --monitors --locks -e >threads.txt exit console, cd /var/lib/openhab2 and send me the two created files there: list.txt, threads.txt Maybe they will tell something about the condition of the threads. Thanks Pawel

Milhauzz commented 3 years ago

Good morning, just sent a files. Maybe I missed something when reading the logs after lox binding stopped working in past. But firstly please have a look into the files I sent, maybe you will see something on the first sight, who knows.

Unfortunately I do not have the logs right now where lox bindig stopped working, they are overwrited already. If there will be a demand I can do following: 1, stop maintenance cronjob for restart 2, set lox biding log level to debug 3, wait ... 4, share the logs 5, wait for miracle

Anyway thanks in advance !

ppieczul commented 3 years ago

Are those logs that you sent with the failure or without?

Milhauzz commented 3 years ago

Without.

ppieczul commented 3 years ago

Great. So now please get the same when it fails. Hopefully it will tell us something what is happening. Thanks a lot

Davidoff77test commented 3 years ago

Hi, ppieczul, I just sent these two files (threads.txt, list.txt) on your mail. These files was copied in freeze state. Thx in advance

ppieczul commented 3 years ago

I see where the problem is, it is a deadlock between processing incoming binary message and sending keep alive message. I will work on a fix during the weekend. Thanks for the data!

ollietb commented 3 years ago

It's just crashed again for me and I've got the log files you asked for. I sent them to your email.

Milhauzz commented 3 years ago

That's excelent what's happened here. Luckily I'm stable all the time. Thanks all for participating on this issue.

Davidoff77test commented 3 years ago

Yes, deadlock is a programmer nightmare :)

ppieczul commented 3 years ago

Can you please test this version: https://www.dropbox.com/s/w4zxn25ogzyatbz/org.openhab.binding.loxone-2.5.12-SNAPSHOT.jar?dl=0 You need to remove the standard binding (uninstall) from the paper UI, no need to remove thing before. Then go to openhab CLI and check that there are really no bindings:

bundle:list | grep Loxone

If there are then remove them manually

bundle:uninstall <ID>

Install manually UPNP dependency:

feature:install openhab-transport-upnp

Then copy the Loxone binding JAR to

/usr/share/openhab2/addons

Then check that OH picked this up again by listing bundle. Proper version should show this:

<ID> │ Active │  80 │ 2.5.12.202012181428     │ openHAB Add-ons :: Bundles :: Loxone Binding

Let it run and see if it is any better.

Davidoff77test commented 3 years ago

Done. We will see. Thank you

ppieczul commented 3 years ago

What do you see with the fix?

W dniu pt., 18.12.2020 o 15:58 Davidoff77test notifications@github.com napisał(a):

Done. We will see. Thank you

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/openhab/openhab-addons/issues/8693#issuecomment-748129457, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOHS7R5JMLO3POVXT2BSNDSVNU2JANCNFSM4SIIJGXQ .

Davidoff77test commented 3 years ago

Fix still looks stable. But it's just 4 days. Wait week at least.

Davidoff77test commented 3 years ago

Hi, it's stable for 14days. Good job.

Milhauzz commented 3 years ago

Testing right now and it seems that everything is all right. Thank you very much!

ollietb commented 3 years ago

What is the recommended way to install this update? Wait for it to be merged into an openhab release? Or follow the instructions in https://github.com/openhab/openhab-addons/issues/8693#issuecomment-748123780

ppieczul commented 3 years ago

@ollietb if you install jar you will have the fix now. It is merged into main, but not 2.5 yet.

Davidoff77test commented 3 years ago

Just for info. There is fix in version 3.0.0?

ollietb commented 3 years ago

I just updated my openhab from 2.5 to 3.0.1 using the openhab config tool and the loxone connection still goes down after a few days until I login and reboot the device.

ppieczul commented 3 years ago

Please post logs, thanks. The fix should be in version 3.0.1. Correction: fix is NOT in 3.0.1, it is earliest in 3.1.0.M1.

czw., 4 mar 2021 o 17:51 Ollie Harridge notifications@github.com napisał(a):

I just updated my openhab from 2.5 to 3.0.1 using the openhab config tool and the loxone connection still goes down after a few days until I login and reboot the device.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openhab/openhab-addons/issues/8693#issuecomment-790761898, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACOHS7SUI4RIJHXSZOAMIG3TB63AZANCNFSM4SIIJGXQ .

ollietb commented 3 years ago

OK thanks I've emailed you the logs

ppieczul commented 3 years ago

To all that need that fix - it is NOT included in 3.0.1. The earliest version containing this fix is 3.1.0.M1.