jsr107 / jsr107spec

JSR107 Cache Specification
Apache License 2.0
413 stars 164 forks source link

Check of ClassLoader when putting objects in to a cache #17

Closed ChristerF closed 12 years ago

ChristerF commented 13 years ago

I propose that we limit puts so that keys and values that are put in to a cache owned by a particular CacheManager must be loadable through the ClassLoader associated with the CacheManager in question - at a minimum for the byValue variant of caches.

Here is a rough sketch:

checkValidClassLoader(Object.getClass().getClassLoader()) would be called for the key and the value being put.

boolean CacheManager.checkValidClassLoader(ClassLoader classLoader) { ClassLoader current = getClassLoader(); if (current == classLoader) return true; while ((current = current.getParent()) != null) { if (current == classLoader) return true; } return false; }

smillidge commented 12 years ago

How does that work with the Thread context classloader as that is typically used in JEE containers to try and load classes and to obtain the correct classloader based on context.

ChristerF commented 12 years ago

In most JEE cases, the thread's context ClassLoader would be a ClassLoader associated with a deployed application and the expectation is that your CacheManager is associated with a particular application and its ClassLoader. So, this scheme should work by default in JEE and prevent obscure errors. It would be interesting if you can come up with a case where it would not work.

smillidge commented 12 years ago

I'm not sure a CacheManager will be associated with a specific application. Especially in the case of a long lived DataGrid where we may have many applications accessing the same data in the grid. I suppose it could be the case that the way to get to the grid will be via an application loaded cache manager but I'm not sure it will always be true. I can envisage creating a CacheManager and placing in JNDI in the JEE case which can then be accessed by many deployments.

I think what I was trying to get at is that a Cache can just use the Thread Context ClassLoader to deserialize data out of the cache but that context ClassLoader doesn't have to be the ClassLoader of the CacheManager itself.

Steve Millidge Director C2B2

Providing the foundations for Enterprise Scale Java.

T: 08450 539457 M: 07920 100626 W: www.c2b2.co.uk E: smillidge@c2b2.co.uk

-----Original Message----- From: ChristerF [mailto:reply@reply.github.com] Sent: 25 September 2011 18:36 To: smillidge Subject: Re: [jsr107spec] Check of ClassLoader when putting objects in to a cache (#17)

In most JEE cases, the thread's context ClassLoader would be a ClassLoader associated with a deployed application and the expectation is that your CacheManager is associated with a particular application and its ClassLoader. So, this scheme should work by default in JEE and prevent obscure errors. It would be interesting if you can come up with a case where it would not work.

Reply to this email directly or view it on GitHub: https://github.com/jsr107/jsr107spec/issues/17#issuecomment-2192007

ChristerF commented 12 years ago

Steve,

I have been battling with these questions wrt to a well known data grid product.

As for a data grid, it is very true that one of the most important aspects is the ability to share data with multiple applications. As powerful as that ability is, as error prone and complicated it can be in the light of multiple ClassLoaders.

There is nothing stopping an implementation of JSR 107 to hand out a distinct CacheManager instance for each application and handling the sharing with other applications and a data grid as an implementation detail. And this particular proposed feature would enable a decent level of error checking.

For a single JEE server, a "shared" data grid really would like to be hosted in a "base" ClassLoader to all applications sharing the data grid. Ideally, there would be a deployment module for the data grid describing the cache semantics and the classes used for keys and values which serves as a dependency for the JEE modules that would access the data grid . I'm thinking such a concept is likely out of scope for JSR 107 and JEE 7.

In the scenario you describe, you would create a CacheManager in one JEE app, put it in JNDI such that another JEE app can get hold of it. Then the only "safe" classes you could put in that cache would be classes in a shared ClassLoader to the two JEE apps, i.e. the system ClassLoader. If you put any other class you are at the mercy of the applications ClassLoader and an arbitrary alignment of classes between different ClassLoaders. Unfortunately, this represents best practice for several data grid products out there. I'm all for trying to create a flexible, yet consistent way of handling these issues in JSR 107. The check I proposed in this issue is all about ensuring consistency and predictability in the light of multiple ClassLoaders.

Looking forward to your further comments!

Thanks, Christer

smillidge commented 12 years ago

Christer,

I agree Classloading in data grids is much harder as you need to support Entry Processors, Parallel Queries etc. all of which have to operate on the deserialized data in the data grid node away from the application JVM. Therefore the classes in the grid need to be accessible from the grid's classloader.

However for the JSR 107 case I can't see any problem first attempting to use the Thread Context Classloader to deserialize an object when returning from it from a get and then using the CacheManager's classloader if that fails. Therefore I think this check is too restrictive.

Steve Millidge Director C2B2

Providing the foundations for Enterprise Scale Java.

T: 08450 539457 M: 07920 100626 W: www.c2b2.co.uk E: smillidge@c2b2.co.uk

-----Original Message----- From: ChristerF [mailto:reply@reply.github.com] Sent: 26 September 2011 06:58 To: smillidge Subject: Re: [jsr107spec] Check of ClassLoader when putting objects in to a cache (#17)

Steve,

I have been battling with these questions wrt to a well known data grid product.

As for a data grid, it is very true that one of the most important aspects is the ability to share data with multiple applications. As powerful as that ability is, as error prone and complicated it can be in the light of multiple ClassLoaders.

There is nothing stopping an implementation of JSR 107 to hand out a distinct CacheManager instance for each application and handling the sharing with other applications and a data grid as an implementation detail. And this particular proposed feature would enable a decent level of error checking.

For a single JEE server, a "shared" data grid really would like to be hosted in a "base" ClassLoader to all applications sharing the data grid. Ideally, there would be a deployment module for the data grid describing the cache semantics and the classes used for keys and values which serves as a dependency for the JEE modules that would access the data grid . I'm thinking such a concept is likely out of scope for JSR 107 and JEE 7.

In the scenario you describe, you would create a CacheManager in one JEE app, put it in JNDI such that another JEE app can get hold of it. Then the only "safe" classes you could put in that cache would be classes in a shared ClassLoader to the two JEE apps, i.e. the system ClassLoader. If you put any other class you are at the mercy of the applications ClassLoader and an arbitrary alignment of classes between different ClassLoaders. Unfortunately, this represents best practice for several data grid products out there. I'm all for trying to create a flexible, yet consistent way of handling these issues in JSR 107. The check I proposed in this issue is all about ensuring consistency and predictability in the light of multiple ClassLoaders.

Looking forward to your further comments!

Thanks, Christer

Reply to this email directly or view it on GitHub: https://github.com/jsr107/jsr107spec/issues/17#issuecomment-2195154

ChristerF commented 12 years ago

While it is reasonable for the "by value" case to as a best effort use the TCCL, it will for the "by reference" case lead to potential ClassLoader leakages to allow objects from a different ClassLoader.

In the current reference implementation, serialization and de-serialization will occur using the ClassLoader associated with the CacheManager. Making sure the object you put there belongs to the same ClassLoader is not a far stretch IMHO.

/Christer

smillidge commented 12 years ago

I'm just concerned about the implications for when we look at integration with JEE.

I may be wrong but the implication for JEE is that CacheManager's must be deployed with the application that uses them rather than as a container level resource.

The use case I'm thinking about is an application using a CacheManager backed by a large memcached cluster. I could envisage configuring this as a Container level CacheManager available via JNDI like a datasource. With multiple apps pushing data into memcached. With the cache using TCCC to save and load data.

------Original Message------ From: ChristerF To: Steve Millidge Subject: Re: [jsr107spec] Check of ClassLoader when putting objects in to a cache (#17) Sent: Oct 3, 2011 3:46 PM

While it is reasonable for the "by value" case to as a best effort use the TCCL, it will for the "by reference" case lead to potential ClassLoader leakages to allow objects from a different ClassLoader.

In the current reference implementation, serialization and de-serialization will occur using the ClassLoader associated with the CacheManager. Making sure the object you put there belongs to the same ClassLoader is not a far stretch IMHO.

/Christer

Reply to this email directly or view it on GitHub: https://github.com/jsr107/jsr107spec/issues/17#issuecomment-2280534

Sent from my BlackBerry® wireless device

gregrluck commented 12 years ago

What is proposed by Christer seems sensible. We are allowing a class loader to be set when a CacheManager is created. See the Caching class.

Quoting from the Javadoc:

Now isn't all that Christer is proposing is to check the keys and values on the way in? It seems to me that this honours the fail-fast principle and will prevent errors. I propose we add the following to Cache methods which accept puts:

"Both the key and value must be loadable by the CacheManager's class loader otherwise a CacheException is thrown."

On Steve's points about what happens when no class loader is defined, we still define one for the CacheManager, or we were planning to.

We currently allow an implementation one of two options:

 *     Thread.currentThread().getContextClassLoader();
 *     getClass().getClassLoader();

We probably should lock this down. Do you have a view on what it should be?

yannis666 commented 12 years ago

I think the way to address this is to consider what classloader we would want to use if we were specifying it explicitly. Traditionally, at least in our experience with Coherence, in an EE environment the way this is addressed is by placing classes of objects to be shared by different applications in a common, base classloader. Thus if 2 WARs were to share the same cache, classes for both key and value would be placed in the EAR classloader. If not, and say a value class were defined in war1 but NOT in war2, then errors would occur. So, given the above, I would suggest using the parent (ear) classloader to avoid accidentally storing objects in the war classloader but not in the ear. This would not be TCCL, and whether it were getClass.getClassloader would depend on whether the caching classes were located in the ear. Traditionally Coherence solved this problem by recommending placing the Coherence jar in the ear.

Going forward, Oracle is introducing the concept of a GAR, analogous to that of a WAR to make the class loading behavior explicit.

I don't understand how placing the cache(manager) at the system level would work. If application 1 stores Class1 defined in its WAR, what would happen if application 2 did a get on that object? Particularly in an EE environment it seems to me that all applications that want to share a cache must have all classes stored in that cache in their classloader.

Incidentally, if store by value is in use, the classes could be (consistent) copies of each other (i.e. not from a same parent classloader) but if using by reference they must be in the same parent classloader.

chrisdennis commented 12 years ago

I'm very skeptical about both the value and correctness of this check.

Assuming I understand Christer's original proposal he would like a method CacheManager.checkValidClassLoader(Object incoming) that can determine, knowing the ClassLoader instance associated with the CacheManager instance, whether or not a given object graph can be de-serialized at a later point in time by that loader. Christer's current proposed method doesn't fulfill those requirements. What's more, devising such a method that never provokes a false failure is at best exceedingly complicated and expensive, and at worst impossible.

  1. In even a simple scenario the entire object graph has to walked to inspect all the referenced types - this is expensive and requires copious use of reflection (may also fall over in presence of some security policies).
  2. Serialized forms care not one iota about class loader identity - deserialization can be done with a completely unrelated loader (including in a different JVM) as long as the loader has available a class definition which understands the serialized data.
  3. Using writeReplace() the serialized type may bear no relation to the original heap type.
  4. Using custom read logic - the serialized instance may not even be resolved in to a type at all - it may be subsumed by another types deserialization logic.
  5. This all gets even worse when we step outside the bounds of Java serialization and consider other 'serialization' strategies.

Enforcing a requirement like this would seem to force all implementations to make a trade-off between the runtime cost of this check and the number of originally functional use cases they subsequently preclude. This doesn't seem to be a fair decision to place in the implementors lap. I'm much more comfortable with not performing this check, ruling out no use-cases on these grounds and instead failing on retrieve/deserialization when necessary.

ChristerF commented 12 years ago

Chris,

I understand your objections; do you have an alternate strategy that would help prevent leaking ClassLoaders in an application server environment?

Thanks, Christer

chrisdennis commented 12 years ago

I don't really have a workable strategy for that. I'm sure there are many potential things that could be done to prevent these kind of memory leaks but it seems to me that all of them are going to constrain other people who want to do something eminently sensible with the API.

chrisdennis commented 12 years ago

I'm closing this issue right now. If someone can come up with (in a timely manner) an implementable and useful (non restrictive on reasonable use cases) approach here then we can re-open and reconsider.

ChristerF commented 12 years ago

I'd suggest you list the reasonable use cases and the use cases where care should be taken, (i.e when sharing objects/caches between applications deployed in an appserver).

gregrluck commented 12 years ago

Ok with me to close.