openhab / openhab-distro

The binary distribution of openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.29k stars 394 forks source link

openHAB 4.1.0 Milestone 5 - Docker - Slow + 100% CPU load #1619

Closed SHU-red closed 4 months ago

SHU-red commented 5 months ago

Hi there, i know about these issues: https://github.com/openhab/openhab-distro/issues/1617 https://github.com/openhab/openhab-distro/issues/1616 but im not sure if my problem is different.

Since the update to M5 i got:

javax.ws.rs.ClientErrorException: HTTP 404 Not Found

at org.apache.cxf.jaxrs.utils.SpecExceptions.toHttpException(SpecExceptions.java:118) ~[bundleFile:3.6.2]

at org.apache.cxf.jaxrs.utils.ExceptionUtils.toHttpException(ExceptionUtils.java:168) ~[bundleFile:3.6.2]

at org.apache.cxf.jaxrs.utils.JAXRSUtils.findTargetMethod(JAXRSUtils.java:673) ~[bundleFile:3.6.2]

at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:182) ~[bundleFile:3.6.2]

at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:79) ~[bundleFile:3.6.2]

at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[bundleFile:3.6.2]

at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) ~[bundleFile:3.6.2]

at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) ~[bundleFile:3.6.2]

at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:234) ~[bundleFile:3.6.2]

at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:208) ~[bundleFile:3.6.2]

at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:160) ~[bundleFile:3.6.2]

at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:225) ~[bundleFile:3.6.2]

at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:304) ~[bundleFile:3.6.2]

at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doGet(AbstractHTTPServlet.java:222) ~[bundleFile:3.6.2]

at javax.servlet.http.HttpServlet.service(HttpServlet.java:497) ~[bundleFile:4.0.4]

at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:279) ~[bundleFile:3.6.2]

at org.ops4j.pax.web.service.spi.servlet.OsgiInitializedServlet.service(OsgiInitializedServlet.java:102) ~[bundleFile:?]

at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) ~[bundleFile:9.4.52.v20230823]

at org.ops4j.pax.web.service.spi.servlet.OsgiFilterChain.doFilter(OsgiFilterChain.java:100) ~[bundleFile:?]

at org.ops4j.pax.web.service.jetty.internal.PaxWebServletHandler.doHandle(PaxWebServletHandler.java:320) ~[bundleFile:?]

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) ~[bundleFile:9.4.52.v20230823]

at org.ops4j.pax.web.service.jetty.internal.PrioritizedHandlerCollection.handle(PrioritizedHandlerCollection.java:96) ~[bundleFile:?]

at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) [bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) [bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [bundleFile:9.4.52.v20230823]

at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [bundleFile:9.4.52.v20230823]

at java.lang.Thread.run(Thread.java:840) [?:?]
- Via frontail followed by many of these

2023-12-18 15:55:44.332 [INFO ] [.network.internal.utils.NetworkUtils] - CIDR prefix is smaller than /24 on interface with address 172.16.0.1/16, truncating to /24, some addresses might be lost

2023-12-18 15:55:44.337 [INFO ] [.network.internal.utils.NetworkUtils] - CIDR prefix is smaller than /24 on interface with address 192.168.240.1/20, truncating to /24, some addresses might be lost

2023-12-18 15:55:44.348 [INFO ] [.network.internal.utils.NetworkUtils] - CIDR prefix is smaller than /24 on interface with address 192.168.208.1/20, truncating to /24, some addresses might be lost

2023-12-18 15:55:44.365 [INFO ] [.network.internal.utils.NetworkUtils] - CIDR prefix is smaller than /24 on interface with address 192.168.224.1/20, truncating to /24, some addresses might be lost

2023-12-18 15:55:44.370 [INFO ] [.network.internal.utils.NetworkUtils] - CIDR prefix is smaller than /24 on interface with address 192.168.112.1/20, truncating to /24, some addresses might be lost

2023-12-18 15:55:44.391 [INFO ] [.network.internal.utils.NetworkUtils] - CIDR prefix is smaller than /24 on interface with address 172.30.0.1/16, truncating to /24, some addresses might be lost


All in all its a very slow process and many things are not working due to the unresponsiveness caused by the high system-load (it seems)

Infos about my system:

- openHAB info:

runtimeInfo: version: 4.1.0.M5 buildString: Milestone Build locale: de-DE systemInfo: configFolder: /openhab/conf userdataFolder: /openhab/userdata logFolder: /openhab/userdata/logs javaVersion: 17.0.9 javaVendor: Debian osName: Linux osVersion: 6.5.11-300.fc39.x86_64 osArchitecture: amd64 availableProcessors: 2 freeMemory: 1327658472 totalMemory: 2122317824 uptime: 218 startLevel: 30 addons: null clientInfo: device: ios: false android: false androidChrome: false desktop: true iphone: false ipod: false ipad: false edge: false ie: false firefox: true macos: false windows: false cordova: false phonegap: false electron: false nwjs: false webView: false webview: false standalone: false pixelRatio: 1 prefersColorScheme: dark isSecureContext: false locationbarVisible: true menubarVisible: true navigator: cookieEnabled: true deviceMemory: N/A hardwareConcurrency: 8 language: de languages:

Using the image image: "openhab/openhab:milestone"

SHU-red commented 5 months ago

Going back to the openhab:latest image containing openHAB 4.0.4 solves the issue

kaikreuzer commented 5 months ago

The info log entries are coming from the network binding. Could you check if the behavior disappears if you disable the network binding things temporarily (or deinstall the network binding completely)?

SHU-red commented 5 months ago

I did the following

  1. openHAB 4.0.4 deactivate all network things via GUI
  2. Re-Deployd openHAB M5 CPU load much lower

Seems to be the root cause

Maybe off topic: Now the frontail log was completely bloated with many of the following errors (Not sure if this is a migration-thing?)

Script execution of rule with UID 'counters-3' failed: An error occurred during the script execution: Cannot invoke "org.openhab.core.model.rule.scoping.RulesClassCache.get(Object)" because "cache" is null in counters

EDIT: Additionally

kaikreuzer commented 5 months ago

Good to hear that the CPU load issue is located and that it is not a general issue in core. I'm not sure what might cause this in the network binding, though. We would probably need to have a closer look at what kind of things you have exactly configured there. I'd suggest to raise an issue for the network binding in openhab-addons.

Wrt the other comments: All of these would need more analysis and details before anything can be done about them.

SHU-red commented 5 months ago

Sorry @kaikreuzer, my morning was kind of hectic and i think i gave you wrong informations

Even after Inactive Network devices and deinstalling the Network Add-on i get

Either openHAB not booting at all --> Low CPU Or booting and causing high cpu utilization

If i can do anything regarding debugging i will try to do so

miloit commented 5 months ago

I can also confirm same behavior on my raspberry pi 4

kaikreuzer commented 5 months ago

@miloit Also on docker?

@SHU-red Could you try to disable the three "suggestion finders" under Settings->System Settings->Add-on Management->Show advanced?

@wborn Do you notice any general issue with M5 in docker?

@mherwege & @andrewfg Do you think the suggestion finders could have any side effects here when using docker?

andrewfg commented 5 months ago

Do you think the suggestion finders could have any side effects here when using docker?

Let me answer in several parts..

miloit commented 5 months ago

I am on obenhabian with raspberry 4..

mherwege commented 4 months ago
  • I can't comment on the process scanner, but AFAIK it is a one shot scan. Rather than repetitive. ??

It scans when suggestions are retrieved from the REST api. That would be when opening the UI setup wizard or add-on store. So I cannot see impact at startup.

The IP scan finder scans only once at startup.

The process finder cannot be disabled. The other 3 can through an advanced option for add-ons. Because of this, they are only installed (feature installed) in the suggestion finder service activation. Could installing these features at that time have this impact? This is similar to installing add-ons. Disabling these finders should allow to see this. Also, switching on trace logging for org.openhab.core.config.discovery.addon could show the sequence and timing of these feature installations. I don’t have experience with docker, so cannot test myself. I am not convinced that’s the reason, but it could be worth checking out.

holgerfriedrich commented 4 months ago

I cannot reproduce this behavior on 4.1.0.M5 (without Docker on RPI4 openhabian, and with Docker on Windows). I created the container a few times and restarted as well. I experience normal CPU usage.

As Kai said, it seems the message

CIDR prefix is smaller than /24 on interface with address .../16, truncating to /24, some addresses might be lost

is from the network binding. If I uninstall the binding, the message is gone.

It seems we need you help to debug.

J-N-K commented 4 months ago

Indeed the IP scan in the network binding is problematic. I remember that I introduced the code that prints this warning and IIRC the issue is that the underlying code generates a list of all possible IP addresses before the scan starts. A /16 subnet has 65534 hosts, so a lot of memory is allocated and eventually lead to an OOM exception on low-memory systems. It might well be that the code also blocks a thread while scanning and that is an issue even for /24 subnets during startup.

SHU-red commented 4 months ago

There is something that comes to my mind which could be connected to this problem:

Could this be the reason why my system causes problems (by scanning a huge huge amount of addresses), compared to to other ones?

Im not sure what will happen if i make changes to this file wile running many containers but i could maybe take the time to stop all of my containers, re-configure the address-range and re-test M5

But this is just a guess

wborn commented 4 months ago

Do you notice any general issue with M5 in docker?

Works fine for me so far.

I think it is a known issue that when using the network binding with Docker it scans too many interfaces resulting it a lot of load, see also:

SHU-red commented 4 months ago

I think it is a known issue that when using the network binding with Docker it scans too many interfaces

Sounds like could be the thing! Thanks

kaikreuzer commented 4 months ago

So can we close this issue then, @SHU-red?

SHU-red commented 4 months ago

Im not absolutely sure :sweat_smile: It looks like i will face problems as soon as i switch to a release of M5 or newer ...

But maybe you want to propose that we close this issue and track the debugging in one of the other already existing issues --> Makes sense

Are the other known things also related to especially M5? I think in my case M4 was running perfect but especially M5 is bringing these problems --> Just wanted to make sure that the topics are really about the same root cause

kaikreuzer commented 4 months ago

Yes, I would have thought that it can be further tracked under https://github.com/openhab/openhab-addons/issues/15437.

But why you never had the problem and it only turns up now with M5 is indeed a mystery...

SHU-red commented 4 months ago

Ok, whatever is senseful in your mind is ok for me

To morrow I will post a comment on the other topic If i can help tracking the the problem just let me know

Thank you for taking the time

SHU-red commented 4 months ago

Closed due to assumption that root cause is the same as in https://github.com/openhab/openhab-addons/issues/15437