Analyze and improve startup time and resource usage

kaikreuzer commented 8 years ago

When trying the runtime on a RasPi 2, it took a couple of minutes to start up, while the "old" distribution takes only around 20 seconds on the same hardware. I assume that there might be still something wrong regarding trying to check external repositories or something. Otherwise I would not know why it could be so much slower.

kaikreuzer commented 8 years ago

I just compared the current openHAB demo distribution zip with the Karaf distribution with installed demo (after starting, simply run feature:install openhab-demo) using jvisualvm.

The results are worrying: jvisualvm shows a used heap of 300MB (vs 75MB) and around 150 threads vs. 75 threads.

It seems as if the Karaf distribution is MUCH heavier than our normal distribution. @maggu2810 If we cannot solve this, this could become a blocker for the whole approach; at least, we would probably have to offer both variants, which isn't desirable at all.

@dvanherbergen, is this maybe something you could analyze? Why is there such a big difference and what are our options to reduce it?

dvanherbergen commented 8 years ago

@kaikreuzer I've missed most of the discussion, so what is the ultimate goal? keep 2 runtimes or eventually move towards karaf only? I will look into this as soon as possible, but this weekend some time is reserved for St. Nicholas. Does he visit Germany too?

dvanherbergen commented 8 years ago

@kaikreuzer never mind, I just reread your comment and noticed the goal is to move to karaf only. Time to put my glasses on...

kaikreuzer commented 8 years ago

the goal is to move to karaf only

Correct, that was/is the plan!

St. Nicholas. Does he visit Germany too?

Yes, but only at night ;-)

dvanherbergen commented 8 years ago

wow... I just launched the openhab-runtime in master and the two runtimes in the karaf branch (without installing the openhab-demo). They all have different very usage profiles:

the equinox runtime: heap allocation starts at 500Mb, heap usage starts around 80Mb and goes up to 190MB or 300Mb before garbage collection kicks in and it drops back to 50Mb.
the openhab runtime in the karaf branch: heap allocation starts at 990Mb and climbs to 1.2Gb, heap usage starts around 500Mb and climbs to 690Mb before GC kicks in and drops it to 78Mb.
the openhab-karaf runtime in the karaf branch: heap allocation starts at 500Mb but very quickly expands to 1.5Gb, heap usage starts around 127Mb and climbs to 900Mb before GC kicks in and drops it to 76Mb. This will be an interesting one to figure out...

kaikreuzer commented 8 years ago

openhab-karaf runtime

Forget about this one. The "openhab runtime in the karaf branch" is simply a repackaging of the openhab-karaf to adapt the folder structure to openHAB needs.

kaikreuzer commented 8 years ago

@maggu2810: I suspect that this is due to https://github.com/openhab/openhab2/pull/476 - it probably now scans all jars that we package at startup for their content, instead of treating them simply as a repository from where it can potentially load the add-ons if required.

@dvanherbergen : It would be interesting to compare the resource requirements of a plain Karaf runtime (http://karaf.apache.org/index/community/download.html), to see what would be the lowest limit of all efforts that we could reach.

dvanherbergen commented 8 years ago

When I run with the JVM with option -XX:+UseG1GC , it behaves much better. No unnecessary large heap is allocated and unreferenced objects are released quickly. The openhab-karaf runtime (without demo) dropped after a forced GC to an allocated heap of 100Mb and usage fluctuating between 33Mb and 86Mb. The PermGen space seems big though. Do we really need 50Mb worth of classes?

dvanherbergen commented 8 years ago

Using the default garbage collector, the plain karaf runtime starts at 75Mb used and drops back to a minimum of 26 after a while and then starts climbing again. PermGen is around 25Mb. With the G1 collector, heap usage starts around 30Mb and then slowly starts climbing. When forcing GC, usage goes down to 12Mb.

So overall, with the GC1 collector heap usage seems better managed, so I suggest we use this one as the default. I'll have a look at the number of threads next.

dvanherbergen commented 8 years ago

A little after the container starts, I'm seeing a huge spike in live threads. There around 200 threads created for uPnP, which are discarded afterwards, leaving the average live threads around 100. After a while, uPnP spins up yet another bunch of threads (50 to 100), which are again quickly discarded. Something doesn't seem right there, but this behaviour is the same for the equinox & karaf versions. The difference between the base load of 50 live threads on equinox and 100 live threads on karaf, seems to be caused mainly by pax-web (10 threads) and another thread pool (25 threads) which I haven't been able to identify yet. To be continued...

kaikreuzer commented 8 years ago

so I suggest we use GC1 as the default

+1

There around 200 threads created for uPnP

This is a bug I discovered in jUPnP yesterday as well - it will be fixed in the next days, so don't worry about that for the moment.

caused mainly by pax-web (10 threads) and another thread pool (25 threads)

Ok, this is the important difference then that needs to be investigated!

dvanherbergen commented 8 years ago

20 of the unidentified threads belong to a thread pool dedicated for org.apache.felix.eventadmin.impl.handler.EventAdminImpl. Apparently, felix' EventAdmin's default is 20 threads for event delivery. This can be changed using by setting the property org.apache.felix.eventadmin.ThreadPoolSize. The property should be set to a property higher than 1 otherwise it will fallback to the default of 20. The max pool size is defined by the property value + 10, so I suggest we set it the default to 3. That should be more than sufficient for openhab.

dvanherbergen commented 8 years ago

The pax-web threads are supposedly configurable by the properties org.ops4j.pax.web.server.minThreads=3 org.ops4j.pax.web.server.maxThreads=5 Some of the other extra threads (6) are due to the ssh server which is built in. I will check tomorrow if reducing the threads using these properties works.

maggu2810 commented 8 years ago

Thanks for your detailed informations. IMHO the ssh server shoult be disabled for the normal OH runtime. I kept them alive for my daily work. Nothing that is necessary for a release. Perhaps we could add a few such features to the extension service.

dvanherbergen commented 8 years ago

It turns out that there is only 1 ssh thread running by default. When you connect to the console using the ./client, it fires up another 5, so I'm not sure if we would be able to get rid of this. And for me personally, having ssh directly to the console would be convenient too..

dvanherbergen commented 8 years ago

I created PR 514 for updating the GC and reducing the eventadmin threads. This brings heap usage down to around 80Mb and the number of live threads to 73. No success with the jetty related threads yet, as I'm getting errors like

java.lang.IllegalStateException: Multiple servlets map to path: /basicui/app/*: org.ops4j.pax.web.service.spi.model.ServletModel-16,org.ops4j.pax.web.service.spi.model.ServletModel-18 at org.eclipse.jetty.servlet.ServletHandler.updateMappings(ServletHandler.java:1488)

dvanherbergen commented 8 years ago

When starting, pax-web shows this annoying warning message:

[ce.jetty.internal.JettyServerWrapper] - failed to parse and instantiate of javax.servlet.ServletContainerInitializer in classpath

This is caused by an invalid service defined in the jax-rs dependency. I've opened an issue for this: hstaudacher/osgi-jax-rs-connector#131

maggu2810 commented 8 years ago

Multiple servlets map to path: /basicui/app/*:

@dvanherbergen This should be solved by https://github.com/eclipse/smarthome/pull/699

maggu2810 commented 8 years ago

jvisualvm shows a used heap of 300MB (vs 75MB) and around 150 threads vs. 75 threads.

This brings heap usage down to around 80Mb and the number of live threads to 73.

So, all fine now?

dvanherbergen commented 8 years ago

Memory consumption is fine. Even an Xmx of 128Mb seems to work OK. The total number of threads (live+daemon) is still higher, as Karaf adds some additional management threads. We may be able to optimize this further, but it shouldn't hinder performance too much. Startup time is still rather slow. During the first startup, all bundles are installed and that takes some time. Subsequent starts are faster, but still take some time. There may be some more room for optimisation, so I wouldn't close this issue yet.

ANierbeck commented 8 years ago

On the jetty/pax-web front if there is anything I can help with, just gimme a ping :-)

dvanherbergen commented 8 years ago

Great! As it happens, I do have a question for you :-) Pax-web launches a number of org.ops4j.pax.web.service.jetty.internal.LateInvalidatingHashSessionManager threads. Is there a way to limit the number of threads being spawned? Would a single thread not be sufficient to scavenge sessions for multiple jetty threads?

ANierbeck commented 8 years ago

Actually those are handled by Jetty itself, Pax-Web only provides an alternative SessionManager, which is needed for the HttpService. So in that case I'd think, if we reduce the Jetty-Threads it'll be in sync.

dvanherbergen commented 8 years ago

That doesn’t seem to work. When using org.ops4j.pax.web.server.maxThreads = 7, the number of ‘base’ jetty threads is 7. And then there is an additional 8 threads for the SessionManager. Now, when I set the min and max both to 20, the number of base threads is 20 and there are still 8 threads for the SessionManager. So it would appear that the number is fixed.

ANierbeck commented 8 years ago

Actually the SessionManager is bound to every HttpContext, so if you have 8 bundles using a session manager you have all of them running.

kaikreuzer commented 8 years ago

Does this mean we should make sure to use a single instance of HttpContext? I wasn't aware that we are creating so many instances. But indeed, we usually use httpService.createDefaultHttpContext(), which creates a new instance every time. Shall we introduce a special HttpContextService in ESH? Does anyone know how well this can be integrated with Jersey and possible authentication/session mechanisms?

ANierbeck commented 8 years ago

Hi,

no this doesn't necessarily mean that. Usually you have one Bundle with one HttpContext. And it's fine to use the httpService.createDefaultHttpContext for it. Now depending on your use-case with Pax-Web it's possible to share the Context between Bundles. For example if you want to separate filters from servlets. In this last case you can use pax-web WebContainer Interface to create SharedContextes. As I'm not sure how those use-cases are handled right now, I'd say it's something that needs to be investigated, to find out what suits best.

2015-12-13 11:40 GMT+01:00 Kai Kreuzer notifications@github.com:

Does this mean we should make sure to use a single instance of HttpContext? I wasn't aware that we are creating so many instances. But indeed, we usually use httpService.createDefaultHttpContext(), which creates a new instance every time. Shall we introduce a special HttpContextService in ESH? Does anyone know how well this can be integrated with Jersey and possible authentication/session mechanisms?

— Reply to this email directly or view it on GitHub https://github.com/openhab/openhab2/issues/514#issuecomment-164246561.

Apache Member Apache Karaf http://karaf.apache.org/ Committer & PMC OPS4J Pax Web http://wiki.ops4j.org/display/paxweb/Pax+Web/ Committer & Project Lead blog http://notizblog.nierbeck.de/ Co-Author of Apache Karaf Cookbook http://bit.ly/1ps9rkS

Software Architect / Project Manager / Scrum Master

dvanherbergen commented 8 years ago

I had another look at the startup time. By commenting out the CoreActivator.startRuleEngine() method, I was able to reduce my startup time by 12 seconds ;-)

The hardcoded waits in there should be replaced with a different solution. I suggest a change the FolderObserver in ESH. It could send out a ModelsLoadedEvent after it has done the initial load of the all available models. Then the rule engines can wait for this event to occur before they start processing rules.

kaikreuzer commented 8 years ago

Interesting :-) This code is indeed a dirty hack and I am ashamed of it ;-) Nonetheless I remember that I didn't find any other solution to the problem at the time - startup rules should only be executed when the system has started up. A change that I did at the same time as introducing this hack is this: https://github.com/openhab/openhab2/commit/e610bce83d8718136e07506e9707473ae9788aef#diff-9eb4f271b48006052139077eded10b83R84 The core bundle is started as the very last one, so it actually should never block 12 seconds, because the other services should be there already. I guess this setting has been lost in the Karaf distro. Maybe you can gain already 10 seconds by changing the start level of the bundle as well?

dvanherbergen commented 8 years ago

The start level change has indeed been lost. The core bundle in the karaf branch starts at level 80, which is the default.

On 2015-12-16, at 17:36, Kai Kreuzer notifications@github.com wrote:

Interesting :-) This code is indeed a dirty hack and I am ashamed of it ;-) Nonetheless I remember that I didn't find any other solution to the problem at the time - startup rules should only be executed when the system has started up. A change that I did at the same time as introducing this hack is this: e610bce#diff-9eb4f271b48006052139077eded10b83R84 https://github.com/openhab/openhab2/commit/e610bce83d8718136e07506e9707473ae9788aef#diff-9eb4f271b48006052139077eded10b83R84 The core bundle is started as the very last one, so it actually should never block 12 seconds, because the other services should be there already. I guess this setting has been lost in the Karaf distro. Maybe you can gain already 10 seconds by changing the start level of the bundle as well?

— Reply to this email directly or view it on GitHub https://github.com/openhab/openhab2/issues/514#issuecomment-165167534.

maggu2810 commented 8 years ago

It was a long way to get rid of the start level requirements and I was happy that it works at least. As I have not done start up time analyzes, I was fine with the current state. If never seen this hack before, IIRC I do not use that core bundle at all. Would be great if the situation could be solved without start level changes in a non hackish way. :wink:

kaikreuzer commented 8 years ago

Would be great if the situation could be solved without start level changes in a non hackish way. :wink:

I fully agree and you are both invited to come up with a clean solution. I actually remember that I looked at something like a OSGi start level service, but I didn't succeed in being notified when a certain start level is reached (and then trigger the startup rule execution). If any of you finds a way how to do this, that would be great :-) Until then I would prefer to have the start level changed for this bundle.

dvanherbergen commented 8 years ago

One option would be to poll the start level in a separate thread until it reaches 100 and then publish a SystemStartedEvent. Polling the start level can be done using https://osgi.org/javadoc/r4v43/core/org/osgi/framework/startlevel/FrameworkStartLevel.html#getStartLevel(). The rule engine could then listen for the SystemStartedEvent to trigger startup rules.

kaikreuzer commented 8 years ago

Where should this polling thread be best located? Would we invent a new bundle for it?

kgoderis commented 8 years ago

Sorry for being late to the party, but you guys have an interesting chat ;-) . Reading through all the optimisations you are doing, I was wondering if this is the time to look as well to the startup order of things in ESH? Cfr various of my posts on the different fora, as very very early adaptor of ESH OH2, there is a need to synchronise a bit how things are loaded and parsed wrt Items, Things, and Rules. Still as of today, with an ESH core not older than two weeks, I have very random behaviour at startup where .items are not parsed at all, updates are received for non-existing Items and so forth.

kaikreuzer commented 8 years ago

@kgoderis You are well aware that @maggu2810 just mentioned in https://github.com/openhab/openhab2/issues/514#issuecomment-165447988 that he is actually more rigid than I am when it comes to abolishing start level order of bundles. So no, it might not the best time to come up with that ;-)

dvanherbergen commented 8 years ago

Where should this polling thread be best located? Would we invent a new bundle for it?

How about org.eclipse.smarthome.core ?

dvanherbergen commented 8 years ago

@kgoderis to really eliminate funky stuff startup issues, I believe you would also need to block event publishing during startup. This could be a possible solution:

Both the rule engine and event publisher startup in an inactive mode
In the core bundle, the system startup level is polled until the system is fully started and then emits a SystemStartEvent.
The event publisher picks up the event and starts passing through events. In turn it also emits an BusStartedEvent
The rule engine picks up the BusStartedEvent and kicks off the startup rules. With this approach, the start level of bundles is irrelevant.

maggu2810 commented 8 years ago

@dvanherbergen I like your idea.

But I think you can't use https://osgi.org/javadoc/r4v43/core/org/osgi/framework/startlevel/FrameworkStartLevel.html#getStartLevel(). For which start level you would like to check? The start level that are used for bundles is product specific. The Eclipse IDE for openHBA currently uses start level 1-5. You cannot rely that at start level e.g. 100 the Eclipse SmartHome framework is loaded correctly if I use start level 110 for all ESH bundles.

kaikreuzer commented 8 years ago

Well, the start level for "system up" would have to be configurable through config admin, I guess.

dvanherbergen commented 8 years ago

If you use the command system:start-level on the karaf console, you can see it the 'final' level where the runtime stops incrementing is 100. I'm not sure if this is the same for both equinox and felix.

dvanherbergen commented 8 years ago

When using the equinox runtime with -Dosgi.startLevel=100, it also increments start levels up to 100, so that should work.

maggu2810 commented 8 years ago

If it is configurable I am fine with waiting for a special start level and act if this is reached. For Karaf you could set the startup level that should be reached after start using etc/config.properties "org.osgi.framework.startlevel.beginning=100".

maggu2810 commented 8 years ago

What about this one: http://eclipsesource.com/blogs/2013/01/23/how-to-track-lifecycle-changes-of-osgi-bundles/

dvanherbergen commented 8 years ago

It says:

A BundleListener can only receive notifications after it has been started.

wouldn't that require us to changing startup level sequences again?

maggu2810 commented 8 years ago

Just one other idea. If we want to wait until the bundle1, bundle2 and bundle3 has been started. Would it be an option that every bundleX register (after successful start) a service bundleXready or something similar and stop that dummy service if it is stopped. Another bundle is using a service tracker to track that services and if all are ready it signals an system completely started (through event or another service)?

ANierbeck commented 8 years ago

Guys, afaik Karaf has a service which is available in case a certain StartLevel is reached. This one is also used to wait for Karaf BootFinished in the Pax-Exam tests. Btw. the Interface is called BootFinished, maybe that'll help you already and get rid of the StartLevel thinking ;)

2015-12-17 15:37 GMT+01:00 dvanherbergen notifications@github.com:

It says:

A BundleListener can only receive notifications after it has been started.

wouldn't that require us to changing startup level sequences again?

— Reply to this email directly or view it on GitHub https://github.com/openhab/openhab2/issues/514#issuecomment-165468125.

Apache Member Apache Karaf http://karaf.apache.org/ Committer & PMC OPS4J Pax Web http://wiki.ops4j.org/display/paxweb/Pax+Web/ Committer & Project Lead blog http://notizblog.nierbeck.de/ Co-Author of Apache Karaf Cookbook http://bit.ly/1ps9rkS

Software Architect / Project Manager / Scrum Master

dvanherbergen commented 8 years ago

@kaikreuzer will smarthome also move to karaf? If so, then BootFinished sounds like a good option.

ANierbeck commented 8 years ago

Just in Case one wonders, if all BootFeatures are installed and the feature service tells that it's done, the BootFinished Service is published [1]

regards, Achim

[1] - https://github.com/apache/karaf/blob/master/features/core/src/main/java/org/apache/karaf/features/internal/service/BootFeaturesInstaller.java#L115

2015-12-17 15:49 GMT+01:00 dvanherbergen notifications@github.com:

@kaikreuzer https://github.com/kaikreuzer will smarthome also move to karaf? If so, then BootFinished sounds like a good option.

— Reply to this email directly or view it on GitHub https://github.com/openhab/openhab2/issues/514#issuecomment-165471629.

Apache Member Apache Karaf http://karaf.apache.org/ Committer & PMC OPS4J Pax Web http://wiki.ops4j.org/display/paxweb/Pax+Web/ Committer & Project Lead blog http://notizblog.nierbeck.de/ Co-Author of Apache Karaf Cookbook http://bit.ly/1ps9rkS

Software Architect / Project Manager / Scrum Master

maggu2810 commented 8 years ago

@ANierbeck Hi, you know, I love Karaf :wink:, but I don't believe it is an option for the Eclipse SmartHome framework. The framework does not depend on stuff like Karaf etc.

For openHAB it would be an option but not for Eclipse SmartHome bundles.

openhab / openhab-addons

Analyze and improve startup time and resource usage #514