openhab / openhab-distro

The binary distribution of openHAB
https://www.openhab.org/
Eclipse Public License 2.0
1.3k stars 394 forks source link

Performance issues on embedded ARM (e.g. Raspberry Pi 2) #10

Closed kaikreuzer closed 7 years ago

kaikreuzer commented 8 years ago

This is a follow up of https://github.com/openhab/openhab2/issues/514, where we assumed that switching to G1 GC fixes most of the performance issues (see https://github.com/openhab/openhab2/issues/514#issuecomment-162176235). Unfortunately, G1 is not supported on ARM, if you run the openHAB runtime on a Raspberry Pi 2, you will get

Java HotSpot(TM) Client VM warning: G1 GC is disabled in this release.

See also https://blogs.oracle.com/henrik/entry/oracle_releases_jdk_for_linux which states: "Some examples of missing features include the G1 GC".

I did now tests on a Raspberry Pi 2 (with java 1.8.0_40-b27) and measured these times for startup (until the dashboard and all UIs of the demo are available and the CPU usage dropped below 95%):

Even the installation of an additional binding through the Paper UI takes already more than 30 seconds - this is just adding a small bundle to the OSGi container...

The current openHAB 2 distribution (non-karaf) takes less than 30 seconds for startup, so the issue is clearly with Karaf. It seems that hardly anybody before us has ever tried to use Karaf 4 on a Raspberry Pi. The "plain" Karaf 4.0.3 e.g. sets the "-server" jvm flag, which isn't available on ARM. And even this empty Karaf container alone takes more than a minute for the startup.

THIS IS A SEVERE BLOCKER FOR MOVING AHEAD WITH OPENHAB 2!!!

One of the goals of openHAB 2 was to make it better suited for embedded environments - the current solution is in contrast UNUSABLE on it.

I guess we might need some input and suggestions from the Karaf team in order to take a decision on how to move forward. @ANierbeck, @cschneider, @jbonofre: Was it a bad idea from the start to use Karaf on embedded ARM computers or do you see some hope for us?

Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

ANierbeck commented 8 years ago

This is a rather strange thing. Did you run a vanilla Karaf on a PI before, any measurements there? I know that @jgoodyear did some testing of OpenDayLight (ODL) on a PI before, afaik he didn't spot such an issue.

kaikreuzer commented 8 years ago

Did you run a vanilla Karaf on a PI before, any measurements there?

Not before, but the "empty Karaf container alone takes more than a minute for the startup" was a test of vanilla Karaf today.

kaikreuzer commented 8 years ago

Ok, the only test I did not do today was the non-karaf openHAB runtime. It turns out that my Raspberry might have some general issues, since even openHAB 1.7.1 demo takes 2-3 minutes for startup.

So I guess I will have to flash my sd card and rerun the tests. @maggu2810, @dvanherbergen: Do you maybe have a board ready where you could briefly test the offline-distro?

ANierbeck commented 8 years ago

this is a performance measurement by @jgoodyear: http://icodebythesea.blogspot.de/2014/10/opendaylight-helium-on-raspberry-pi.html

dvanherbergen commented 8 years ago

I will do a test on a Raspberry 1.

kaikreuzer commented 8 years ago

I will do a test on a Raspberry 1.

Cool, thanks. Note that the RasPi1 will anyhow be about 4x slower than the RasPi2, so startup times of a few minutes are normal there.

this is a performance measurement by @jgoodyear:

Thanks @ANierbeck, I had seen this blog post, but couldn't really extract any information regarding startup times from it.

dvanherbergen commented 8 years ago

After a few restarts, I'm seeing about 5 minutes on a pi1 to get to the dashboard started.

Jan 03, 2016 10:48:09 PM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Trying to lock /home/pi/userdata/lock
Jan 03, 2016 10:48:09 PM org.apache.karaf.main.lock.SimpleFileLock lock
INFO: Lock acquired
Jan 03, 2016 10:48:09 PM org.apache.karaf.main.Main$KarafLockCallback lockAquired
INFO: Lock acquired. Setting startlevel to 100
2016-01-03 22:51:48.847 [INFO ] [smarthome.event.InboxUpdatedEvent   ] - Discovery Result with UID 'ntp:ntp:local' has been updated.
2016-01-03 22:52:00.500 [INFO ] [smarthome.event.InboxUpdatedEvent   ] - Discovery Result with UID 'yahooweather:weather:12817655' has been updated.
2016-01-03 22:52:29.512 [INFO ] [basic.internal.servlet.WebAppServlet] - Started Basic UI at /basicui/app
2016-01-03 22:53:01.458 [INFO ] [arthome.ui.paper.internal.PaperUIApp] - Started Paper UI at /ui
2016-01-03 22:53:01.461 [INFO ] [assic.internal.servlet.WebAppServlet] - Started Classic UI at /classicui/app
2016-01-03 22:53:01.524 [INFO ] [assic.internal.servlet.WebAppServlet] - Stopped Classic UI
2016-01-03 22:53:11.389 [INFO ] [.dashboard.internal.DashboardService] - Started dashboard at /start

My first impression is that the startup is mainly CPU bound. It runs at 100% for while. I'll investigate some more tomorrow.

maggu2810 commented 8 years ago

I did now tests on a Raspberry Pi 2 (with java 1.8.0_40-b27) and measured these times for startup (until the dashboard and all UIs of the demo are available and the CPU usage dropped below 95%):

  • first start: 22 minutes
  • subsequent start: 11 minutes

Even the installation of an additional binding through the Paper UI takes already more than 30 seconds - this is just adding a small bundle to the OSGi container...

I have not tested OH2 but another Eclipse SmartHome based product. I have not done a time measurement, but it need perhaps some minutes, but IIRC never above a few minutes.

cschneider commented 8 years ago

I just did a test to start a plain karaf 4.0.3 on a raspaberry pi 2 with java version build 1.8.0-b132. It takes about 40 seconds. This looks good to me. Will test the openhab2 distro with karaf next.

jgoodyear commented 8 years ago

Interesting...

Which JVM are you running? When i worked on getting ODL to run on RPI, I did a number of performance related tweaks to the system - those are outlined on my git repo: https://github.com/jgoodyear/ODLHeliumOnRPi/tree/master/rpi-configs

I'd also note that not all SD cards have similar performance - some are faster read/write than others.

Cheers, Jamie

On Sun, Jan 3, 2016 at 7:50 PM, Christian Schneider < notifications@github.com> wrote:

I just did a test to start a plain karaf 4.0.3 on a raspaberry pi 2 with java version build 1.8.0-b132. It takes about 40 seconds. This looks good to me. Will test the openhab2 distro with karaf next.

— Reply to this email directly or view it on GitHub https://github.com/openhab/openhab-distro/issues/10#issuecomment-168551931 .

ANierbeck commented 8 years ago

I also tested on a PI 1, took around 60 seconds to start a vanilla Karaf, I also tried to run the minimal distribution to see if it does have an effect it's also starting in about 60s. Switching to Equinox as basis didn't change anything in timing. At this point I'm quite satisfied :-)

dvanherbergen commented 8 years ago

I believe this could be related to https://github.com/openhab/openhab2/issues/563. It looks like a lot of time is being spent in binding the httpcontexts. The openhab offline version with the demo package binds about a dozen of them. @ANierbeck in the JettyServerWrapper.java, it scans all the bundles on the classpath for ServletContainerInitializer services every time a new httpcontext is added. The openhab offline distribution is quite large (120mb), and scanning all the jars on the pi can be time consuming. Is there a way to disable this behaviour for testing purposes?

ANierbeck commented 8 years ago

@dvanherbergen right now, no this isn't possible. But the scanning of the bundles is only bound to bundle which are in the "classpath" of the current or the jetty bundle, so if you don't import all other bundles, the scanning shouldn't be of much issue at this point.

see: https://github.com/ops4j/org.ops4j.pax.web/blob/master/pax-web-jetty/src/main/java/org/ops4j/pax/web/service/jetty/internal/JettyServerWrapper.java#L283-L287 and: https://github.com/ops4j/org.ops4j.pax.web/blob/master/pax-web-api/src/main/java/org/ops4j/pax/web/utils/ClassPathUtil.java#L140-L201

kaikreuzer commented 8 years ago

Thank you all for testing, so it seems indeed that I have some issue with my Pi, which I will have to investigate. I have removed the "severe" from the title, but as Karaf still adds some burden on the startup behaviour, we must to continue to look for optimizations.

jbonofre commented 8 years ago

I will take a look, but starting from a minimal distribution might help.

dvanherbergen commented 8 years ago

so if you don't import all other bundles, the scanning shouldn't be of much issue at this point.

I added a small timer to the addContext method in the JettyServerWrapper. It times from the start of the method until after the scan for the ServletContainerInitializer is completed. These are the results during the startup:

2016-01-04 09:06:59.352 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 6241 ms to check initializer
2016-01-04 09:09:12.615 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 45559 ms to check initializer
2016-01-04 09:09:25.823 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 12192 ms to check initializer
2016-01-04 09:10:23.594 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 37265 ms to check initializer
2016-01-04 09:10:59.443 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 33981 ms to check initializer
2016-01-04 09:11:05.959 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 6391 ms to check initializer
2016-01-04 09:11:09.190 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 2747 ms to check initializer
2016-01-04 09:11:10.043 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 745 ms to check initializer
2016-01-04 09:11:16.973 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 6367 ms to check initializer
2016-01-04 09:11:23.204 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 6116 ms to check initializer
2016-01-04 09:13:44.703 [WARN ] [ce.jetty.internal.JettyServerWrapper] - # 135450 ms to check initializer

The time taken varies greatly depending on which httpcontext is being added, but it is adding up to a large amount of time being spent on scanning.

dvanherbergen commented 8 years ago

I did another test, this time I removed the scan completely. On my raspberry pi 1, openHAB now starts in under 4 minutes (compared to 8 minutes with the scan). @ANierbeck any suggestions on how to proceed?

cschneider commented 8 years ago

I looked a bit into https://github.com/ops4j/org.ops4j.pax.web/blob/master/pax-web-jetty/src/main/java/org/ops4j/pax/web/service/jetty/internal/JettyServerWrapper.java. It only seems to scan for "META-INF/services". There should not be much below that path so I wonder why this seems to have such a big impact.

dvanherbergen commented 8 years ago

If I remember correctly, there is something wrong with the classloaders. The JettyServerWrapper will throw a ClassNotFoundException in some cases when it is trying to instantiate the ServletContainerInitializer class, even though it is available on the classpath. Maybe this has an impact on the time.

jbonofre commented 8 years ago

It could be related to bunch of META-INF/services to seek depending of the classloader.

ANierbeck commented 8 years ago

@jbonofre I already tested the vanilla minimal distribution, it took as long as the std. distribution on a PI 1. So in that case I doubt that it'll help here.

@dvanherbergen ... about the META-INF/service scanning, there are different ways to approach this.

  1. remove or disable the service scanning
  2. make this configurable
  3. make sure the application itself doesn't import to much unneeded, unwanted bundles
  4. make sure the web-applications use a shared HttpContext

right now I would highly favor 3 and 4 because 1 isn't acceptable at all, and 2 is just another flag which needs to be propagated throughout the application. That'll take some time till the next version does have it :-)

dvanherbergen commented 8 years ago

@ANierbeck is there any drawback/risk to using a shared HttpContext?

ANierbeck commented 8 years ago

@dvanherbergen, not to my knowledge. It's as if a bunch of servlets are registered from within the same Activator.

kaikreuzer commented 8 years ago

FTR: I have flashed my RPi2 with a new Rasbian image and the latest JDK 8. (First) Startup time for openHAB-Karaf-offline distro is now 4.5 min compared to 2 minutes of the non-karaf demo distro. So there is a major issue, which I still consider a blocker - but it seems that the http scanning is a major culprit here, so I am looking forward to a solution of it.

Even the installation of an additional binding through the Paper UI takes already more than 30 seconds - this is just adding a small bundle to the OSGi container...

This is now down to 10 secs, but even this feels like a long time for adding and starting a single bundle. Maybe it is due to some complex dependency resolution that is done by the feature installer?

dvanherbergen commented 8 years ago

@ANierbeck, when trying to reuse the same httpcontext, it fails with the following error:

2016-01-05 11:45:28.175 [ERROR] [org.eclipse.smarthome.ui            ] - [org.eclipse.smarthome.ui.chart(101)] The activate method has thrown an exception
java.lang.IllegalStateException: Http context DefaultHttpContext [bundle=org.eclipse.smarthome.io.net_0.8.0.201601050935 [109], contextID=default] is already associated to bundle org.eclipse.smarthome.ui.icon_0.8.0.201601050935 [129]
    at org.ops4j.pax.web.service.spi.model.ServerModel.associateHttpContext(ServerModel.java:278)
    at org.ops4j.pax.web.service.internal.HttpServiceStarted.getOrCreateContext(HttpServiceStarted.java:1003)
    at org.ops4j.pax.web.service.internal.HttpServiceStarted.registerServlet(HttpServiceStarted.java:185)

Is there a certain configuration property that needs to be changed?

ANierbeck commented 8 years ago

When registering you'll need to use a sharable HttpContext, which you can get from the WebContainer via getDefaultSharedHttpContext() see also here: https://github.com/ops4j/org.ops4j.pax.web/blob/master/pax-web-itest/pax-web-itest-base/src/main/java/org/ops4j/pax/web/itest/base/support/Bundle1Activator.java#L33-L41 to re-use this shared context you just need to reference it: https://github.com/ops4j/org.ops4j.pax.web/blob/master/pax-web-itest/pax-web-itest-base/src/main/java/org/ops4j/pax/web/itest/base/support/Bundle2Activator.java#L27-L41

dvanherbergen commented 8 years ago

mmm. That's not going to work. I don't think we should add dependencies from eclipse smarthome to pax-web.

kaikreuzer commented 8 years ago

I don't think we should add dependencies from eclipse smarthome to pax-web.

Definitely not. So option 1 might be the better choice for us after all?

dvanherbergen commented 8 years ago

I think so. At least temporarily until option 2 is available.

ANierbeck commented 8 years ago

If you can wait a bit for Pax-Web 6, it'll support the OSGi R6 specification for the WhiteBoard extender, then you can do it the "std." way.

Definitely not. So option 1 might be the better choice for us after all?

Not at all.

It may be possible to implement some caching, but I'm not done with that thought yet.

kaikreuzer commented 8 years ago

Not at all.

@ANierbeck: We need a solution until end of this week and the scanning adds a few minutes to the startup time. So what other option do you see than to deactivate it for the moment?

ANierbeck commented 8 years ago

@kaikreuzer, as openhab isn't the only application using Pax-Web, it just can't be disabled. Every application using JSF needs it. Even the WebSocketServer is registered with ServletContainerInitializers in Jetty.

Make sure to have a reduced dependency tree, which needs to be parsed. Till the end of the week I don't see anything that can be done here.

kaikreuzer commented 8 years ago

I am not asking to change anything in Pax-Web by the end of this week. But @dvanherbergen suggested to patch it specifically for openHAB as long as we have no other solution.

ANierbeck commented 8 years ago

In that case, feel free :-)

kaikreuzer commented 8 years ago

@dvanherbergen Any idea why we still have user reports where the startup takes 20 minutes and more? See e.g. https://github.com/openhab/openhab-distro/pull/19#issuecomment-185379142 or https://community.openhab.org/t/closed-very-slow-oh2-starting-up-on-synology-dsm6/7447/8. This is pretty weird, isn't it?

dvanherbergen commented 8 years ago

No idea. I will need to do some testing to verify. I'm a bit overloaded at work at the moment, I should be able to check this during the weekend.

dvanherbergen commented 8 years ago

I just did a fresh offline install on a raspberry pi 1 using the standard & demo package.

This looks OK to me, given that it's only a pi 1.

jgoodyear commented 8 years ago

Have you tried any of the performance tweaks I've outlined on my OpenDaylight Helium on RPi project? https://github.com/jgoodyear/ODLHeliumOnRPi/tree/master/rpi-configs

Cheers, Jamie

On Tue, Feb 23, 2016 at 3:29 PM, dvanherbergen notifications@github.com wrote:

I just did a fresh offline install on a raspberry pi 1 using the standard & demo package.

  • initial startup, until "Installed 'openhab-package-demo'" : 8 min 30 secs.
  • second startup, until "Started dashboard at /start" : 3 min 11 secs.

This looks OK to me, given that it's only a pi 1.

— Reply to this email directly or view it on GitHub https://github.com/openhab/openhab-distro/issues/10#issuecomment-187842176 .

haunma commented 8 years ago

Just as another data point, this issue is still very real. I installed the "online" version of openHAB2 via the debian jessie package in deb https://openhab.ci.cloudbees.com/job/openHAB-Distribution/ws/distributions/openhab-online/target/apt-repo/ / on my Jetson K1 board yesterday. This is a much faster system than any of the RPis (multicore Tegra K1, 2 GBytes RAM, SATA SSD).

Initial launch (until I could bring up the web page) took 20 minutes, 30 seconds. (I haven't timed a second or third launch yet.) Installing any binding takes ~ 5 minutes each. One or more CPUs were maxed out during all of this. The times would have been longer on a single-core machine.

I ssh'ed into the console. That was quite an adventure as the first time it took about one hour of max CPU before calming down. In the meantime the ssh client times out waiting for the password prompt. Googling for the timeout message takes me to the Karaf mailing list!! Not a promising sign. But eventually I was able to get in (it now takes "only" thirty seconds to reach a password prompt). "info" gives

openhab> info
Karaf
  Karaf version               4.0.4
  Karaf home                  /usr/share/openhab2/runtime/karaf
  Karaf base                  /var/lib/openhab2
  OSGi Framework              org.eclipse.osgi-3.10.2.v20150203-1939

JVM
  Java Virtual Machine        OpenJDK Zero VM version 25.91-b14
  Version                     1.8.0_91
  Vendor                      Oracle Corporation
  Pid                         25736
  Uptime                      19 hours 4 minutes
Error executing command: java.lang.NullPointerException

Perhaps the nightly build I'm running has some other issues? I never saw errors in the Paper UI. Well, except for the fact that the spinners never quit spinning until you reload the page. But I take it that is expected?

Coming from an embedded SW background this has been a little discouraging so far. I had no idea openHAB was so heavyweight :(

haunma commented 8 years ago

On subsequent startups, things are better... "only" 1 minute 20 seconds of CPU time. Some of the improvement may be due to switching from OpenJDK to Oracle JDK.

kaikreuzer commented 8 years ago

Some of the improvement may be due to switching from OpenJDK to Oracle JDK.

From my experience OpenJDK is not usable at all, because it indeed shows the performance issues you mention. With OracleJDK, the situation is much better.

jgoodyear commented 8 years ago

Hi Mark,

Just for comparison purposes, could you try installing a plain Karaf distribution (not OpenHab) just to see what the base container's boot up time is like?

On Sun, Jun 5, 2016 at 4:14 AM, Kai Kreuzer notifications@github.com wrote:

Some of the improvement may be due to switching from OpenJDK to Oracle JDK.

From my experience OpenJDK is not usable at all, because it indeed shows the performance issues you mention. With OracleJDK, the situation is much better.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openhab/openhab-distro/issues/10#issuecomment-223796335, or mute the thread https://github.com/notifications/unsubscribe/AAYlIYawgO_dpNCe9hctRoJGjSoIiZS2ks5qInBRgaJpZM4G9xtg .

maggu2810 commented 8 years ago

I ssh'ed into the console. That was quite an adventure as the first time it took about one hour of max CPU before calming down. In the meantime the ssh client times out waiting for the password prompt. Googling for the timeout message takes me to the Karaf mailing list!!

The host key is generated on the first login. On my embedded systems it took around 5 minutes using the Oracle JVM. You can use a pre-generated one if you do not want this one-time time consuming step. But I don't think the distro should generate the same key for all machines.

The Jetty keystore is generated on the first start, too.

kaikreuzer commented 7 years ago

I would suggest to close this issue as the startup times seem to be acceptible by now - it is mainly the certificate generation on a very first startup that slows things down; nonetheless, I think this is a valuable feature that we should not drop.