openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.56k forks source link

[remoteopenhab] [Docker] SSE timeout occurred while receiving events #8977

Closed bwosborne2 closed 3 years ago

bwosborne2 commented 3 years ago

Expected Behavior

I expect the remote bridge to remain ONLINE. This is between 2 docker installations on the same host.

Current Behavior

Bridge initially comes online and then errors out.

events.log

2020-11-07 12:43:19.496 [INFO ] [me.event.ThingStatusInfoChangedEvent] - Thing 'remoteopenhab:server:172_17_0_3' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Error occurred while receiving events
2020-11-07 12:47:19.492 [INFO ] [smarthome.event.ThingUpdatedEvent   ] - Thing 'remoteopenhab:server:172_17_0_3' has been updated.
2020-11-07 12:47:19.498 [INFO ] [me.event.ThingStatusInfoChangedEvent] - Thing 'remoteopenhab:server:172_17_0_3' changed from OFFLINE (COMMUNICATION_ERROR): Error occurred while receiving events to ONLINE
2020-11-07 12:48:19.605 [INFO ] [me.event.ThingStatusInfoChangedEvent] - Thing 'remoteopenhab:server:172_17_0_3' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Error occurred while receiving events

Possible Solution

Steps to Reproduce (for Bugs)

  1. Install openHAB3 & openHAB2 in docker on the same host ( I mapped the OH2 port to 8181)
  2. Add the` remoteopenhab binding to OH3
  3. Discover & add the OH2 bridge 4, Wait for the errors
  4. No Profit :(

Context

I am testing openHAB3 and possible migration scenarios from OH2. This will become mor eimportant after the year-end release of OH3.

Your Environment

Raspberry Pi 3B+ with Raspbian Lite Buster OH2 2.5.20 OH3 3.0.0 snapshot build 2004

Server: Containers: 2 Running: 2 Paused: 0 Stopped: 0 Images: 2 Server Version: 19.03.13 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Native Overlay Diff: true Logging Driver: json-file Cgroup Driver: cgroupfs Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: runc Default Runtime: runc Init Binary: docker-init containerd version: 8fba4e9a7d01810a393d5d25a3621dc101981175 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd init version: fec3683 Security Options: seccomp Profile: default Kernel Version: 5.4.72-v7+ Operating System: Raspbian GNU/Linux 10 (buster) OSType: linux Architecture: armv7l CPUs: 4 Total Memory: 924.8MiB Name: raspberrypi ID: ULMP:4UP7:TDLA:TRUY:5YF6:5726:GWSM:5LX4:6BL3:WXVU:DHQA:NSPJ Docker Root Dir: /var/lib/docker Debug Mode: false Registry: https://index.docker.io/v1/ Labels: Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false

* Operating System and version

uname -a Linux raspberrypi 5.4.72-v7+ #1356 SMP Thu Oct 22 13:56:54 BST 2020 armv7l GNU/Linux

lolodomo commented 3 years ago

Please enable and show DEBUG logs for the binding.

lolodomo commented 3 years ago

@wborn : do you know any potential problem by using SSE between 2 docker containers ?

bwosborne2 commented 3 years ago

Please enable and show DEBUG logs for the binding.

A little bit of a newbie question, but how do I start the console in oh3? The openhab-cli command does not appear to be available.

lolodomo commented 3 years ago

Doc is here: https://www.openhab.org/docs/administration/console.html

lolodomo commented 3 years ago

And please add "docker" In the issue title.

bwosborne2 commented 3 years ago

add "docker" In the issue title

Done.

I finally got debug logs. I restarted the binging, it connected & then dropped. There are currently no items defined but I saw the same behaviour with Items defined. debug.log

lolodomo commented 3 years ago

Ok, this is easier with logs ! As you can see, the error is a timeout exactly after one minute. If you linked no item, there is nothing received during one minute. I certainly must handle the SSE timeout and maybe increase it. I will try to first reproduce and then fix.

bwosborne2 commented 3 years ago

The load average in the Pi is minimal & the communication occurs over the internal Docker network on the device. There should not be that long a timeout IMHO.

root@raspberrypi:/opt/openhab/userdata/logs# uptime
 10:23:29 up 1 day, 17:00,  1 user,  load average: 0.00, 0.00, 0.00
top - 10:22:40 up 1 day, 16:59,  1 user,  load average: 0.00, 0.01, 0.00
Tasks: 116 total,   1 running, 115 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.2 us,  0.4 sy,  0.0 ni, 99.3 id,  0.0 wa,  0.0 hi,  0.1 si,  0.0 st
MiB Mem :    924.8 total,     86.6 free,    533.7 used,    304.5 buff/cache
MiB Swap:    100.0 total,     59.0 free,     41.0 used.    333.6 avail Mem

The only other service running is HABApp installed in a Python venv pointed at OH3.

lolodomo commented 3 years ago

I think the timeout occurs simply because nothing is received from your OH2 server. Have you items updated on your remote server? I am setting no timeout in the code, so I suppose one minute is the default. A reconnection is implementated but the job is runnning only every 5 minutes.

lolodomo commented 3 years ago

Please rename again the issue to mention SSE timeout.

bwosborne2 commented 3 years ago

Please rename again

Done I remember having the same issue with Items defined & linked to OH3 Items. Let me try that again.

My ultimate plan would be to migrate from OH2 installed natively to OH3 installed in a Docker container on my Debian VM.

bwosborne2 commented 3 years ago

Added 2 items to OH2 but did not link channels to OH3 items. Timed out again.

debug2.log

lolodomo commented 3 years ago

Ok, I am able to reproduce the timeout error in case I subscribe to a topic that is not "talking". I definitively have to handle this timeout.

bwosborne2 commented 3 years ago

Is this a docker-specific issue or a more general issue?

lolodomo commented 3 years ago

General issue. I have to disable the reading timeout on the client side. This will be fixed this evening.

I did not encountered this problem because my server is updating items at least one time per minute, which is probably the case for most users.

bwosborne2 commented 3 years ago

Thanks. The particular astro items I have defined only update once per day. Let me know when I can test in a Docker snapshot.

lolodomo commented 3 years ago

Sorry, I have absolutely no idea when and how builds for docker are produced.

bwosborne2 commented 3 years ago

I believe that are produced as part of the snapshot release process.