openhab / openhab-addons

Add-ons for openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.86k stars 3.58k forks source link

[network] (Docker) Still massively slowing down boon when extended default-adress-pools are used in /etc/docker/daemon.json #16810

Open SHU-red opened 3 months ago

SHU-red commented 3 months ago

Expected Behavior

When openHAB boots, the pingdevices shall be initialized relatively quickly. The network-binding-things should be editable when clicking on them and not be loading forever. The network-binding should not slow down other things to be discovered/initialized

Current Behavior

Sorry for re-opening this, already referenced in:

As seen/documented in the linked discussions im pretty sure, that the combination

I even upgraded my server recently to basically a gaming PC hardware and even then i am experiencing

I deinstalled the network binding and everything is fine

Possible Solution

Not sure, because all of this was already discussed in the other topics but the result overall was something like "with the current version things should be better"

To repeat already discussed solutions:

I have no advanced knowledge in all of these things but i feel like there could be something like a additional setting in the "advanced" section of the network-binding, extremely reducing the amount of scanned IPs to a normal range

My Hardware is way too overpowered for openHAB but this is significantly slowing down boot and interfering with the general openHAB stability

Steps to Reproduce (for Bugs)

  1. Use openHAB docker in network_mode: host
  2. Install network binding
  3. Extend default-address-pool
  4. Boot

Context

daemon.json

{
   "default-address-pools": [
        {
            "base":"172.17.0.0/12",
            "size":16
        },
        {
            "base":"192.168.0.0/16",
            "size":20
        }
    ]
}

Your Environment

wborn commented 1 month ago

Did you limit the network interfaces on your ping devices? This config option was added in https://github.com/openhab/openhab-addons/pull/16145.

SHU-red commented 1 month ago

Currently i deinstalled the Network binding and i am sending my magic packets via node-red

Can you just give me a very quick description/link on what to try? I tired to follow your link and look at the PR but not sure how i should try it.

Is it

  1. Re-install network binding
  2. Limit IP-Ranges in network binding settings
  3. Try

I would love to be able to set a network address range

wborn commented 1 month ago

You can limit the interfaces to use in the Advanced settings of your Thing:

config

SHU-red commented 1 month ago

Understood I will test it and give feedback Thanks for caring

lsiepel commented 1 month ago

Currently i deinstalled the Network binding and i am sending my magic packets via node-red

Can you just give me a very quick description/link on what to try? I tired to follow your link and look at the PR but not sure how i should try it.

Is it

  1. Re-install network binding
  2. Limit IP-Ranges in network binding settings
  3. Try

I would love to be able to set a network address range

  1. Your subnets are huge. The /12 has 1 million addresses I can’t think of why you want it that large in a private setting. /24 or /22 is usually enough. The second one is also huge /16. I would expect more issues sooner or later that have nothing todo with openHAB.
  2. the added configuration option is documented. You can set/limit the interfaces in the thing configuration, it depends where if you use UI or file based config.
SHU-red commented 1 month ago

Thanks for your input I will also check reducing my subsets

SHU-red commented 1 month ago

Setting interface for network devices seems to solve the "openhab things freezing" problem

I still see a high cpu load on openhab on my server This may be due to the fact that @lsiepel describes

Hoping this is not too off topic:

{
  "default-address-pools" : [
    {
      "base" : "172.17.0.0/12",
      "size" : 16
    },
    {
      "base" : "192.168.0.0/16",
      "size" : 20
    }
  ]
}

My problem was, that there are only 31 networks allowed in this config I am running more than 31 thats why i need to extend this configuration

As a reminder, currently im running

{
  "default-address-pools": [
        {
            "base":"172.17.0.0/12",
            "size":16
        },
    {
            "base":"192.168.0.0/16",
            "size":20
        },
    {
            "base":"10.99.0.0/16",
            "size":24
        }
    ]
}

Do you have any good suggestions using the default-config as basis and just extending it by something like the 10.99.0.0/16 address range?

Sorry for asking questions, i feel very unsettled regarding this topic

SHU-red commented 1 month ago

One more addition I just tried to monitor my cpu load and during this i deinstalled the network binding again

2024-07-13_13-27

As you see

So im guessing im wrongly accusing the network binding for this utilization I think this issue is closed and i have to check somehow if the CPU load is justifiable or if there is something else i have to dig in

Thank you

SHU-red commented 1 month ago

Sorry in advance for these many messages. One last thing i did after deinstalling the network binding was to restart the openHAB docker container

And as you can see, the utilization of OH has drastically fallen

So im back on the status of my reply above: https://github.com/openhab/openhab-addons/issues/16810#issuecomment-2226840631

2024-07-13_13-34

wborn commented 1 month ago

Maybe your tool can show the CPU usage per thread similar to top -H ? The threads have a name so those can help to identify what is consuming your CPU cycles.

jsetton commented 1 month ago

I have the same issue with the final release of OH 4.2 running in a Docker container using host network. There is still a huge delay at start-up for the configured pingdevice things to show up. Sometimes up to an hour to get the related things up. The same happens if I try to add a new pingedevice thing. First, the add new thing page takes a very long time to be displayed. Once added, it takes a while before it is shows up as well and when it's available, it takes a while to get the thing properties to show up.

I did everything listed above to limit the network discovery scope without any success. I decreased the Docker default address pools size to 24. I also selected the network interfaces to use for each of my configured pingdevice things. The only difference is that the CPU is pretty much idling when the delay is happening. Looking at trace logs for the binding, not much is happening during that time as well. Each delay seems to be related to some kind of timeout that isn't logged. One last thing to point out is after the initial delay for the pingdevice thing to become available, its status seems to be updated as expected. So it is mostly an initialization/configuration issue.

I also ran a simple test using a new Docker container using host network and only installing the network binding. I experienced no issue up to OH 4.1.0 while I did from OH 4.1.1 and above. When I ran the container on its own network, the issue no longer happens. For reference, I have 33 network interfaces on my Docker host.

openhab-bot commented 1 month ago

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/network-binding-things-are-slow-to-start-and-slow-to-load-in-the-settings-ui/157001/8

SHU-red commented 1 month ago

Sorry for taking my time. Very busy and wanted to observe this for a few days.

I have no idea why this probem is so hard do chase but since setting the Interfaces to only my network interface it seems that the load is ok, ping devices are working and opening the thing-settings is .... at least ok

I want to highlight again at this point that i have a very powerful homeserver --> Evern running this machine i can still feel a lag in opening the pingdevices

So i can understand the example of @jsetton especially if you are running a lower-powered machine

Running:

runtimeInfo:
  version: 4.2.0
  buildString: Release Build
locale: de-DE
systemInfo:
  configFolder: /openhab/conf
  userdataFolder: /openhab/userdata
  logFolder: /openhab/userdata/logs
  javaVersion: 17.0.11
  javaVendor: Debian
  osName: Linux
  osVersion: 6.8.5-301.fc40.x86_64
  osArchitecture: amd64
  availableProcessors: 16
  freeMemory: 639302192
  totalMemory: 1593835520
  uptime: 196953
  startLevel: 100
addons:
  - binding-astro
  - binding-chromecast
  - binding-fronius
  - binding-http
  - binding-ipcamera
  - binding-lgwebos
  - binding-luxtronikheatpump
  - binding-mqtt
  - binding-network
  - binding-ntp
  - binding-openweathermap
  - binding-shelly
  - binding-somfytahoma
  - binding-telegram
  - binding-wifiled
  - binding-wled
  - misc-metrics
  - misc-openhabcloud
  - persistence-influxdb
  - transformation-jsonpath
  - ui-habot
clientInfo:
  device:
    ios: false
    android: false
    androidChrome: false
    desktop: true
    iphone: false
    ipod: false
    ipad: false
    edge: false
    ie: false
    firefox: true
    macos: false
    windows: false
    cordova: false
    phonegap: false
    electron: false
    nwjs: false
    webView: false
    webview: false
    standalone: false
    pixelRatio: 1
    prefersColorScheme: dark
  isSecureContext: false
  locationbarVisible: true
  menubarVisible: true
  navigator:
    cookieEnabled: true
    deviceMemory: N/A
    hardwareConcurrency: 12
    language: en-US
    languages:
      - en-US
      - en
    onLine: true
    platform: Linux x86_64
  screen:
    width: 3440
    height: 1440
    colorDepth: 24
  support:
    touch: false
    pointerEvents: true
    observer: true
    passiveListener: true
    gestures: false
    intersectionObserver: true
  themeOptions:
    dark: dark
    filled: true
    pageTransitionAnimation: default
    bars: light
    homeNavbar: default
    homeBackground: default
    expandableCardAnimation: default
    blocklyRenderer: null
  userAgent: Mozilla/5.0 (X11; Linux x86_64; rv:128.0) Gecko/20100101 Firefox/128.0
timestamp: 2024-07-17T19:36:13.502Z
jsetton commented 1 month ago

So i can understand the example of @jsetton especially if you are running a lower-powered machine

I don't think this is related to a resource issue since there is a clear difference in behavior in the same environment between OH versions. Something changed between OH 4.1.0 and 4.1.1. Also, it may not be solely related to running OH in a Docker container as per the forum post linked above.

The network binding had a patch applied in 4.1.1 related to network interfaces filtering parameter but looking at the code, I can't see what could cause that delay issue. The only reason I can see is some library update from upstream. Hopefully @wborn would have a better insight on this.

yndtrud commented 1 month ago

Hello dear developers!

It's me! Help me!

openHAB versions 4.1.0 and previous ones work great and do not cause any problems. When I update to 4.1.1, 4.2.0 or any other versions (including snapshot and milestone), I always get a problem with Network Binding.

I've been struggling with network binding for 7 months now!!! I need a doctor!

My system: ASUS PN41 PE (PentiumSilver N6000 1.10GHz /8GB RAM/SSD) / WinSrv2022 Latest OHv4.2.0, AzuleJDK v17 Latest, Mosquitto v2.0.18, Z2M v1.39.0, PosgeSQLv14.10 + rrd4j (oh bult-in)

IP address 192.168.13.5/24 [eth5] is selected in openHAB (IPv6 is disabled), the broadcast address 192.168.13.255 is specified.

The system has LAN and WIFI adapters installed. The server is connected to the network via LAN. The server also has the Hyper-V (Virtual Machines) role installed, so there are two virtual adapters: one for the internal network of virtual machines and the second, this is a virtual one over a physical LAN (The system works through this adapter. Standard connection for Hyper-V).

2024-07-25_134728

Through network binding I monitor the state: 1) Internet connection status (server ping by dns name) 2) Status of hosts on the network: computers, TVs, routers, smart devices - total 26 devices (ping device IPб) 3) The status of services on the server - 9 (sql server, mail server, zigbee2mqtt and others). Connection to application port (server machine local ip 127.0.0.1).

Some Network Binding things 2024-07-25_122543

Netwok Device common config 2024-07-25_122653

Server service standart config 2024-07-25_122628

What works wrong in network binding: 1) If the WIFI adapter and/or the internal network adapter of virtual machines is enabled, then initialization of openHAB (System reached start level 100) after startup takes up to 5 minutes (if you specify the interface through which to ping devices) and up to 10-15 minutes (if you do not specify the interface). 2) When the specified network adapters are enabled, the properties of the things are opened very slowly through the web interface. At least 5-6 seconds (or more). 3) When pinging or connecting to a port (service discovery), the OFF state is often detected, although it should be ON. In version 4.1.0 everything is stable; in newer versions, determining the state of the host/service does not work correctly (random OFF state). The state of other (WI-FI or Virtual Machine adapters) adapters does not matter - the problem always remains.

Temporarily I disabled unnecessary network adapters. In the settings of things, I specified interface eth5 (for network devices) or lo0 (for services on the server) on which they should work.

2024-07-25_122412

Now the openHAB v4.2.0 loads and shows the state of things (of network binding) quickly, (unnecessary network adapters are turned off!), but determines the state of hosts and services is not stable (many OFF states).

v4.1.0 (was fine) and v4.2.0 (random OFF). Solid is ON, empty if OFF 2024-07-25_134218_1

v4.2.0 (image enlarged for convenience) 2024-07-25_134312_2

I do not know what to do. I would like to use the latest version of the system, and not stay on 4.1.0. Maybe I can help somehow? Well, help!

My old topic for 4.1.1 update

lsiepel commented 1 month ago

Would you be able to get a debug log for the full 10-15 mins of the slow start for the binding?

yndtrud commented 1 month ago

Would you be able to get a debug log for the full 10-15 mins of the slow start for the binding?

Logs with Debug oh420_ntw_debug.zip

Network adapters ListNets.txt

I made two runs, fast and slow, finish load = System reached start level 100

  1. Slow (all adapters enabled), finish load: 15:59:35.161
  2. Fast (unnecessary adapters disabled), finish load: 16:03:40.807

With slow loading, there are many different errors when executing my rules, since the rules have already started working, but there is still no access to persistent storage or other things due to network binding problems.

Binding config

2024-07-25_155306

Network thing 5-6 second property open (all adapters enabled)

2024-07-25_160019