[rfc 124] Module Context startup should define its interaction with start levels

bjhargrave commented 15 years ago

Original bug ID: BZ#1032 From: andy.piper@oracle.com Reported version: R4 V4.2

Blocker for: BZ#1187

bjhargrave commented 15 years ago

Comment author: andy.piper@oracle.com

Module Contexts that are started asynchronously (the default) can currently be initialized in an order that violates theh StartLevel ordering of the bundles.

Although asynchronous startup is necessary to deal with service dependencies and cycles it does not require that StartLevel be ignored.

I would like to see rfc 124 require that StartLevel be observed for module contexts so that module contexts with the same start level (determnined by their enclosing bundle) are started only after module contexts with a lower StaretLevel have been started. Module contexts with the same start level should continue to be created concurrently.

This semantic seems consistent with the intent of StartLevels.

bjhargrave commented 15 years ago

Comment author: hal.hildebrand@oracle.com

I'd like to add support for this. I had originally thought that respecting the start level was implicit in the way things would naturally be implemented in Blueprint, but we have discovered that the Spring/DM implementation does not implicitly conform to the startup level semantics of the container and so this needs to be called out in the spec.

bjhargrave commented 15 years ago

Comment author: @bjhargrave

I don't see why any extender (or any bundle for that matter) should care about start levels. If a bundle is started, then it gets processed. Otherwise it doesn't.

Since the framework must increment start levels sequentially, an extender will see the bundle events in the order the bundles are started. If the order of the event delivery means something to the extender, thats fine. But I would be loathe to see the spec state the extender must adhere to start levels which is a framework concept.

So I guess I am against this. I am not convinced this is a good idea.

bjhargrave commented 15 years ago

Comment author: hal.hildebrand@oracle.com

The issue is asynchronous start, which is the default for Blueprint. The bundle events may be seen in the correct order, but they may not be processed in the correct order. Consequently, it seems like a framework level service contract would be violated. If the bundles are depending on the start up level, then this contract would be violated - meaning that contexts which have a higher start level may be be started before contexts of a lower start level.

bjhargrave commented 15 years ago

Comment author: @bjhargrave

If you want to spec that Blueprint should process bundles in the order it observes them being ready (e.g. started), that would be reasonable.

If you want to spec that Blueprint must understand bundle start levels, then I object to that.

bjhargrave commented 15 years ago

Comment author: hal.hildebrand@oracle.com

Yes, the bug is a request that Blueprint process the bundles in the order they were started, not that Blueprint understand start levels.

bjhargrave commented 15 years ago

Comment author: glyn.normington@springsource.com

Yes, the bug is a request that Blueprint process the bundles in the order they were started, not that Blueprint understand start levels.

I think there is a significant downside to the Blueprint service not understanding start levels here. It means that each started bundle must have its module context completely built before the Blueprint service starts building the module context of the next started bundle. So if there are many bundles at the same start level, their module contexts would need to be built sequentially.

If the Blueprint service did understand start levels, there would be the possibility of parallelising the building of module contexts within the same start level. This seems like a desirable feature on multi-core systems.

bjhargrave commented 15 years ago

Comment author: @bjhargrave

Blueprint can also look for the FrameworkEvent.STARTLEVEL_CHANGED event to detect boundaries between start levels.

bjhargrave commented 15 years ago

Comment author: @tjwatson

Blueprint can also look for the FrameworkEvent.STARTLEVEL_CHANGED event to detect boundaries between start levels.

This can only work if the extender bundle is started at a very early start-level and all bundles are only ever started/stopped as a result of changing the framework start level.

What about systems that are already running (at their "final" start level) and new bundles are installed and started. There is no way to tell the extender that the bundle/framework events should be batched/queued for processing later. Similar situation with refresh packages. How is the extender to know that a set of bundles are being stopped and restarted as a result of a refresh packages or a start-level change. How would they to wait for a STARTLEVEL_CHANGED or PACKAGES_REFRESHED event before processing the events and figuring out the start level boundaries.

I think a solution here is fruitless, especially in a running system where start level changes and refresh package changes are likely to be rare. The installation of new components may not be so rare. The only think I can see being spec'ed is BJ suggestion in comment 4 to mandate paying attention to bundle start/stop order.

bjhargrave commented 15 years ago

Comment author: schnabel@us.ibm.com

Maybe I'm missing something here: If there is a requirement that the module context for one bundle be started before the module context for another bundle, wouldn't there (more than likely) be some reference/dependency between the two that would force blueprint to keep that requirement? If you're sensitive to startup order, wouldn't adding such a dependency be a better way to ensure that modules are started in the order you need?

I think making blueprint startup level aware is strange, and as Tom says, would require/assume that the blueprint bundle is started early enough for start levels to matter.

bjhargrave commented 15 years ago

Comment author: hal.hildebrand@oracle.com

I really don't want the perfect becoming the enemy of the good. If the extender isn't loaded first, then of course it can't do anything about bundle start up order. That's just life and it is just documented as the inevitable and unavoidable by product of loading the extender after bundles it will operate on have been loaded.

I don't see that as a big deal, nor solving this unsolvable problem as necessary to solving this issue.

Indeed, the way to ensure this semantic is to pay attention to the bundle start up order, which is the only practical way that the extender can infer anything about the start level. This is simply accomplished, in Blueprint for example, by simply having a queue which the bundles which are going to be asynchronously started are placed, in the order that the synchronous bundle started event occurs. Then the thread which would have normally done the start up is simply pulling from this queue to start up the bundles.

Very simple, and obeys exactly the semantics this bug is calling for.

bjhargrave commented 15 years ago

Comment author: hal.hildebrand@oracle.com

I'm summarizing an IM conversation I had with Erin in the hope of further clarification.

The primary issue is that the start up of the Blueprint module, in the asynchronous case (the default) is not guaranteed to follow the start up order of the bundle to which the module belongs. Thus, a module for a bundle at start level 3 can have construction start after a bundle which was started at start level 8.

We must be clear that we are not talking about the ordering of how the dependencies of the module are resolved. This bug is merely about guaranteeing that the start of the construction of the module obeys the same start ordering of the bundles to which they belong.

WRT synchronization, the Blueprint extender bundle listener is synchronous. Therefore, in the obvious implementation of this contract - i.e. a queue - the extender is already in a synchronization block when it would place the request to construct the blueprint module in the queue. Thus, there is no synchronization overhead there.

On the reading end of the queue, there is a thread pool which is pulling the start requests and this is only a momentary synchronization around the read of the queue. Multiple threads in the pool can now proceed at constructing the bundles' modules, in parallel, resolving service dependencies of the modules asynchronously, using the service events of the framework.

Yes, in a perfect world, there would be no need for relying on the start up order of the bundles because we would be using service dependencies to determine our ordering. However, because we have the start level service in the framework, users will make use of this semantic in the start up ordering. Some uses will be because they cannot resolve their issues with service dependencies. Most will undoubtedly be because they are misusing the start level service because it is simply available and no one is preventing them from using it. Regardless of the reason why they are relying on start level, as a vendor I cannot tell them that they cannot rely on start up ordering of the bundles because the extender doesn't respect this ordering. Customers hate it when I tell them things like that.

bjhargrave commented 15 years ago

Comment author: Adrian Colyer <acolyer@vmware.com>

Is the start order guarantee enough for your scenario Andy/Hal? Let's suppose that we queue start events as suggested, and that there are multiple threads pulling these events from a queue and processing them. We can guarantee that event A will be pulled from the queue before event B if A was delivered before B, but we can't then guarantee that the thread processing A will get to make progress ahead of the thread processing B - that is down to JVM scheduling. This proposal also doesn't include a property that I had previously thought was important in your scenario: ensuring that all events at start level N had been processed (to conclusion, good or bad), before starting on events at start level N+1. Is that latter requirement still valid?

bjhargrave commented 15 years ago

Comment author: hal.hildebrand@oracle.com

Yes, I believe my understanding of the problem is incomplete. Here is the email explaining the desired semantic by Andy Piper (who is still waiting to get his credentials resolved so he can interact with the bug tracking system).

The point of the queue is so that contexts can have their startup delayed if they are not in the same start level as those currently starting. Imagine you have 4 bundles B1-B4 and corresponding start levels S1-S2. i.e.

B1, S1 B2, S1 B3, S2 B4, S2

So the system resolves the bundles and moves to start level 1, B1 and B2 are placed in the queue like so.

B2, B1

The contexts for B1 & B2 are started. B2's context completes startup and is removed from the queue:

B1

The system then increases the start level to 2, note that since startup is async for the contexts there is nothing to tell it that B1 has not finished. At S2, B3 and B4 get placed in the queue:

B4, B3, B1

B4 and B3 are not started yet however, because B1 has not completed and that means S1 is still being processed. B1 then completes, is removed from the queue and the reader thread starts B4 and B3 because they are now at the lowest start level. Note that since B3 & B4 are essentially started together B3 could depend on B4 and things would still complete successfully.

And so on.

bjhargrave commented 15 years ago

Comment author: andy.piper@oracle.com

Ok, so I finally have my credentials restored so can comment on this. It's clear that there were different understandings of what I was talking about, and my pursuit of this was based on what I though was broad agreement about something different. I have changed the title to reflect a minimum bar of clarification.

As Hal points out what I am asking for is a somewhat different semantic to that currently implemented by bundles wrt start levels.

My argument remains that iff you are using start levels then you are interested in the ordering of bundle activation that they supply and that you are therefore likely interested in the "activation" semantics of module contexts also. Specifically, start levels guarantee that bundle activators have been run before moving to the next start level. Although, activators could fire off new threads breaking this synchronous contract, I do not see this as any longer in the purview of start levels. If you are interested in start levels then you are interested in the synchronous behaviour. Blueprint however has async startup built into the model and my argument is that the semantic that people are looking for with activators should be extended in a reasonable way to blueprint.

Clearly, as pointed out by Glyn and others, fully synchronous startup makes no sense - you could not then have service dependencies of any complexity so the model I proposed allows for normal asynchronous startup within a start level. Clearly also service dependencies provide a fine-grained way of ordering startup, however we have also found that at a coarse-grained level this becomes extremely hard to manage, for instance if you want to have phased server states (as most application servers, and UNIX run levels - which I assume are what start levels are modelled on - support).

As to supporting a running system. Surely this is also a problem for bundle activators today and start levels in general? Monotonic phasing only really works at startup and shutdown. Sure this points to a fundamental deficit in start levels, but like Hal says I don't want the perfect to be the enemy of the good (enough).

bjhargrave commented 15 years ago

Comment author: Adrian Colyer <acolyer@vmware.com>

I've spent a good portion of today carefully going over the module context lifecycle wrt. this issue.

This bug contains some important changes/clarifications and should be reviewed carefully

catch no. 1: the specification stated that module context creation began on the STARTING event, but in actual fact in Spring DM it begins on the STARTED event, and for good reason: this guarantees that the Activator will have run, and that the Activator and context creation are not potentially running in parallel
this lead to a clarification in the description of the handling of lazy activation in section 5.4.1.6 - specifically, to drive the STARTED event and ensure that context creation consistently happens after activation, the extender needs to call start(START_TRANSIENT) if a service reference triggers creation of a context for a lazily activated bundle:

(in 5.4.1.6)

"In the second case: • The ServiceFactory method getService is invoked on a lazy service (because a client has passed the ServiceReference obtained from the registry to the BundleContext.getService operation). • The Bundle.start(START_TRANSIENT) operation is invoked to force the bundle to start immediately • The normal STARTING event is fired by the framework • The bundle is activated by the framework • The STARTED event is fired and the framework moves the bundle to the ACTIVE state • Synchronous creation of the module context begins

o   Any lazy ServiceFactory objects registered during LAZY_ACTIVATION are updated to return the 
    true services defined in the module blueprint.
o   Module context creation completes

• The getService operation returns the true service from the newly created module context"

Now we have the nice property that Bundle activation always brackets module context creation: activators run before the context is instantiated, and after it has been destroyed.

I have added a new section 5.2.1.2 "Start Level Service" :

"5.2.1.2 Start Level Service The Blueprint service extender bundle manages the lifecycle of the module contexts that it creates. The creation process for a given module context begins upon receipt of either a LAZY_ACTIVATION or STARTED event for the corresponding bundle. Creation is considered complete once the module context reaches either the CREATED or FAILED state. If the extender bundle receives an event (LAZY_ACTIVATION or STARTED) for a bundle at start level n, and the extender is still in the process of creating module contexts for bundles with a start level < n, then the event is queued for processing once creation of all module contexts for bundles with lower start levels has completed. Thus creation of module contexts for bundles at a given start level can proceed in parallel, but creation of module contexts at level n is deferred until completion of the creation process for any module contexts at a start level < n. "

In addition, a new section 5.2.1.3 provides lifecycle diagrams of the kind found in the core specification to explain the module context lifecycle (see attached images).

Leaving this bug open for discussion, hopefully we can reach agreement and close out on the next 124 call (Tuesday 17th).

bjhargrave commented 15 years ago

Comment author: Adrian Colyer <acolyer@vmware.com>

Created attachment 85 lifecycle diagram, part 1

Attached file: lifecycle.jpg (image/jpeg, 120321 bytes) Description: lifecycle diagram, part 1

bjhargrave commented 15 years ago

Comment author: Adrian Colyer <acolyer@vmware.com>

Created attachment 86 Lifecycle diagram, part 2

Attached file: lifecycle-2.jpg (image/jpeg, 104108 bytes) Description: Lifecycle diagram, part 2

bjhargrave commented 15 years ago

Comment author: @bjhargrave

this lead to a clarification in the description of the handling of lazy activation in section 5.4.1.6 - specifically, to drive the STARTED event and ensure that context creation consistently happens after activation, the extender needs to call start(START_TRANSIENT) if a service reference triggers creation of a context for a lazily activated bundle:

Just loading a class from the bundle will trigger starting the bundle. For example, the service implementation class. The spec should not require the extender to call start. The spec should not require the extender to have AdminPermission(*,EXECUTE). The extender is not a management agent.

(in 5.4.1.6)

"In the second case: • The ServiceFactory method getService is invoked on a lazy service (because a client has passed the ServiceReference obtained from the registry to the BundleContext.getService operation). • The Bundle.start(START_TRANSIENT) operation is invoked to force the bundle to start immediately

No. As above, just load a class from the bundle to trigger the start. By the time the class object is returned, the bundle will have been started and the STARTED event fired.

bjhargrave commented 15 years ago

Comment author: Adrian Colyer <acolyer@vmware.com>

Just loading a class from the bundle will trigger starting the bundle. For example, the service implementation class. The spec should not require the extender to call start. The spec should not require the extender to have AdminPermission(*,EXECUTE). The extender is not a management agent.

No. As above, just load a class from the bundle to trigger the start. By the time the class object is returned, the bundle will have been started and the STARTED event fired.

This is the option I originally wanted to go with, however I ran into two problems:

1) There is no guarantee that the service implementation class is defined by the class loader of the bundle being started. It is not uncommon to have a module context that defines a set of components, but none of the component types are actually defined by that bundle. For example, an "infrastructure" bundle that defines components relating to transaction management and persistence etc. All of the classes loaded will be either RI type, or enterprise library types from the relevant enterprise bundles. So it is certainly conceivable that a module-context defining bundle could even have no classes on its bundle classpath.

2) Suppose the bundle does define at least one class - how do we know what that class is, and how do we know we can safely load it without side effects (e.g. from static initializers).

Therefore I concluded that since (a) you can't artificially force a class load, and (b) you can't rely on a class load happening of its own accord, you need to use some other mechanism.

Very happy to receive suggestions of workarounds that would avoid these constraints...

bjhargrave commented 15 years ago

Comment author: hal.hildebrand@oracle.com

I thought it would be a good idea to summarize what I said on the spec call this morning.

My feeling is that what is being proposed is changing the semantics of the underlying notion of "Start". The proposal is that all context at start level N have their dependencies be satisfied before moving on to start level N+1. The argument is that in the framework, the bundle activator is guaranteed to be completed before the framework can move on to the next start level, because the bundle activator methods are synchronous.

However, the proposal is confusing the completion of the bundle activator start() method with the satisfaction of all required services. Even without forking threads to deal with satisfying any dependencies, the running of the bundle activator can be waiting for further services which need to be present before the services the bundle in question will be able to "start" (i.e. register themselves in the service registry). The mechanism for this asynchronous satisfaction of dependencies is the service event mechanism and the ServiceTracker, which wraps this event mechanism.

This asynchronous use of service events to satisfy dependencies is quite common - I would venture to guess that it's actually more common in real world than the activator synchronously blocking, waiting for dependencies. My argumentation is simply that blocking the completion of the bundle activator is a very big issue in the OSGi framework, and given the unpredictable start up order in OSGi, something that is seriously frowned upon.

Further, there doesn't seem to be any good reasoning behind the claim that a service from a bundle in start level N should not be depending on a service from a bundle started in start level N + x. Such dependencies work perfectly fine today and it's not clear why they should be prevented from working with the Blueprint extender in the future. This proposal would completely eliminate what seems to be perfectly legal and valid.

To summarize:

1) Asynchronous satisfaction of dependencies is the norm, not the odd case
2) BundleActivator.start() does not imply dependencies have been satisfied
3) Dependencies from lower start levels to higher start levels are perfectly legal and useful

bjhargrave commented 15 years ago

Comment author: Adrian Colyer <acolyer@vmware.com>

Following on from Hal's comment, and after staring at the specified behavior added to the spec. to explore this option, we agreed in the call to remove the start level interaction from the specification. Leaving it as an option simply seemed to give us the worst of both worlds :- two sets of semantics to maintain and support.

bjhargrave commented 15 years ago

Comment author: Adrian Colyer <acolyer@vmware.com>

Change committed in svn revision 6175.

osgi / bugzilla-archive

[rfc 124] Module Context startup should define its interaction with start levels #931