jsr107 / jsr107spec

JSR107 Cache Specification
Apache License 2.0
413 stars 164 forks source link

Enable cache retrieval where runtime type enforcement is enabled. #340

Closed jerrinot closed 7 years ago

jerrinot commented 8 years ago

CacheManager has Iterable<String> getCacheNames();.

However there is no way to discover what types were configured for a given cache name. However without knowing a type it's impossible to retrieve a cache from CacheManager.

Related issue: https://github.com/jsr107/jsr107spec/issues/339

cruftex commented 8 years ago

+1

smillidge commented 8 years ago

+1

This is the issue I raised at the JavaOne JCache BOF but couldn't explain very well as I couldn't remember the API !

chrisdennis commented 8 years ago

Thinking out loud (and not really thinking that hard if I'm honest). Would it be better/cleaner to have a type agnostic retrieval method that returns Cache<?, ?>? Then the cache could be interrogated for it's type by retrieving it's configuration? I vaguely recall this being suggested when we introduced the option of stricter type checking on caches.

cruftex commented 8 years ago

Would it be better/cleaner to have a type agnostic retrieval method that returns Cache<?, ?>?

The actual interface is type agnostic. You can retrieve arbitrary typed caches with it. However, the semantic isn't type agnostic. The semantic says that its technically a Cache<Object,Object>, which means the cache is not restricting and checking the types (which is only true for storeByReference in reality anyways...).

I think we should investigate whether we can make getCache(name) type agnostic and not being identical in semantic of getCache(name, Object.class, Object.class). That means getCache(name) should return any cache, if known. There are good reasons for it:

cruftex commented 8 years ago

In short, the mismatch is:

The signature is now:

  <K, V> Cache<K, V> getCache(String cacheName);

So I can request a cache of arbitrary types like:

  Cache<String, BigDecimal> cache = manage.getCache(cacheName);

But getCache() requires that the configured cache types are Object, Object.

We should make getCache() return any cache with that name.

brianoliver commented 8 years ago

This can’t really be allowed as any cache configured with a type can then be changed to some other type at runtime, without type checking.

eg: If an application configures a Cache with Strings for keys and Dogs for values, but then you allow

<K, V> Cache<K, V> getCache(String cacheName); To work on any cache, you then permit the following:

Cache<String, Cat> catsCache(“animals-cache”);

Cat mrFits = …;

catsCache.put(“Mr Fits”, mrFits);

ie: You completely allow a Dog cat to be corrupted with a Cat.

You either have type safety, or you have none. There’s no “half way”.

So I can request a cache of arbitrary types like:

Cache<String, BigDecimal> cache = manage.getCache(cacheName); But getCache() requires that the configured cache types are Object, Object.

We should make getCache() return any cache with that name.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jsr107/jsr107spec/issues/340#issuecomment-218433150

cruftex commented 8 years ago

This can’t really be allowed as any cache configured with a type can then be changed to some other type at runtime, without type checking.

Yes, you can do that. But that's no worse then before. You can do right now:

Cache<String, Dog> dogCache = manager.getCache("cache");
Cache<String, Cat> catCache = manager.getCache("cache");
dogCache.put("sammy", new Dog());
catCache.get("sammy"); // => boom!

Or:

 ((Cache) manager.getCache("cache",String.class, Dog.class)).put("sammy", new Cat()); 

There is no runtime type checking, never was. A cache needs to implement it explicitly.

cruftex commented 8 years ago

I can understand Brains' argument insofar, that the current semantic may save a user from a mistake, but it is no strict enforcement. On the other hand the current semantic makes simple things complicated or impossible. E.g. how would a piece of code look like that clears all caches of a cache manager?

brianoliver commented 8 years ago

There is no runtime type checking, never was. A cache needs to implement it explicitly.

This is not true. Implementations can provide runtime checking.

You’re talking about changing the semantics of the API.

If you don’t want any runtime checking, then we should propose it and remove it completely in version 2.0 (but maintain back-wards compatibility as required).

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/jsr107/jsr107spec/issues/340#issuecomment-218483873

cruftex commented 8 years ago

My:

There is no runtime type checking, never was. A cache needs to implement it explicitly.

Should be read as:

You cannot do runtime type checking by the API, because of type erasure. There is no way to enforce correct typing on this level. Of course a cache can implement runtime type checking and there is nothing bad about that.

Anyhow, it was an idea, that doesn't mean that we do not need to carefully check all the implications and the effort that might bring with it. Other ideas?

brianoliver commented 8 years ago

You cannot do runtime type checking by the API, because of type erasure.

Correct. There is no way to enforce correct typing on this level.

Unless the types are specified and/or configured, which allows implementations to perform runtime checking. Of course a cache can implement runtime type checking and there is nothing bad about that.

If we remove the type parameters all together, implementations won’t even be able to achieve runtime type checking. There’d be no way to provide types in a standard manner for implementations to check them. Suddenly “type-checking” would become an implementation-specific, non-standard feature.

A lot of thought when into the current design, backed by a lot of feedback from other groups that had to face the same challenges with type-erasure.

eg: We talked extensively with the Java Language team and the JPA Experts.

Anyhow, it was an idea, that doesn't mean that we do not need to carefully check all the implications and the effort that might bring with it. Other ideas?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/jsr107/jsr107spec/issues/340#issuecomment-218510382

cruftex commented 8 years ago

@brianoliver: Your email replies end up crippled in the GitHub issue.

If we remove the type parameters all together, implementations won’t even be able to achieve runtime type checking.

That is not what I am suggesting. Please read from the beginning.

We talked extensively with the Java Language team and the JPA Experts.

That means you neglect the fact, that after the API is implemented and now in use for a while, that there is more knowledge about it then before?

New aspects, e.g. with the EE integration, come in. Containers have different requirements on the API then an application.

On the other hand I can totally understand a general resistance of changing API semantics, even the tiniest. This comes always with pain you cannot for see in advance.

brianoliver commented 8 years ago

@cruftex

Have you actually tried doing the following?

Cache<String, Dog> dogCache = manager.getCache("cache"); Cache<String, Cat> catCache = manager.getCache("cache");

From what I recall, the TCK tests indicate this is not possible, which contradicts what you're saying.

That is not what I am suggesting. Please read from the beginning.

@cruftex You clearly said..

I think we should investigate whether we can make getCache(name) type agnostic and not being identical in semantic of getCache(name, Object.class, Object.class). That means getCache(name) should return any cache, if known.

Perhaps I'm not doing a great job here. I'm simply attempting to point out the "why" this is not possible and the impact on the API and compatibility. It's not whether I personally like it or not.

If you want to suggest something that works without breaking any implementation or the semantics of existing applications, go for it. In the meantime understand and humbly appreciate the effort that's already been invested here and not simply discount it as "something we didn't think of" or the guidance and advice from other Experts in the community or Java Platform Team, including the language designers that have been before us. Also appreciate that we invested a large amount of time in the "what if" scenarios, in which your suggestion was already covered.

@cruftex

That means you neglect the fact, that after the API is implemented and now in use for a while, that there is more knowledge about it then before?

Totally incorrect. At no point have I done this and I find it personally offensive to suggest as such.

Again, as Leads we invested a large amount of time thinking and trailing many of the scenarios you're suggesting. The API wasn't some naive effort. We invested a lot of time reaching out to groups that have done this before (and thus seen the impact), together with the Java Language Team, including the changes you're suggesting.

As maintainers we have a duty to adhere to the rules of the JCP and the Java Platform. Those rules includes strict adherence to avoiding the possibility of breaking the API or existing semantics. This is one of the very foundations of Java development.

At this point I believe your suggestions break these rules, or change the behavior / intent of the specification which means they can't be achieved in a maintenance release. Whether I like a suggestion or not, whether I like the rules or not, those are the rules.

Ultimately I feel you're neglecting the fact that there are six or seven implementations of this specification, with potentially hundreds / thousands of systems sitting on it. While it's very easy to make breaking/radical changes when a product / project has few or seemingly no customers, the impact of even the smallest change to a standard comes at a huge cost.

@cruftex

On the other hand I can totally understand a general resistance of changing API semantics, even the tiniest.

This is not a general resistance. Again, there are rules that prevent the changes your suggesting.

I'm certainly not against changes. I have a list of things that I would have never done in JCache and those who know me, know my frustration with the things that are in the API which I still personally believe should not be. Over the years I've invested a large amount of effort to simplify the API, not make it larger, add new features or make it more complicated. I invested more time removing "features" from the "feature soup", so that it was at least consistent, implementable and testable, rather than adding them. There's definitely a few rough edges, but it's not that bad.

If I had my way, JCache 2.0 would have even less features and methods, so it's more widely applicable to a broader range of implementations.

This comes always with pain you cannot for see in advance.

For this case, it's complete non-sense. What you're suggesting had been considered and not just in passing. Furthermore, my position on what should not be in JCache hasn't changed since it's release. In fact, it's only strengthen my views.

Moving forward I'd strongly urge you to arrange your thoughts in terms of "what can be done now" for a maintenance release v's "what has to be done in the future".

Doing this we can have constructive conversations about fixes that need to be made right now and things that we can fix in the future which need broader input, investment and discussion.

cruftex commented 8 years ago

My proposal, changing the semantics of getCache(name), is not possible. This will break existing applications and implementations.

There are technical ways to introduce the requested functionality in a backwards compatible way. Starting a conversation about this, needs some prioritization and directions from the Spec Leads first.

The types can be requested via JMX via the CacheMXBean.getValueType() and CacheMXBean.getKeyType() methods. This requires that management is enabled, which can be done, via CacheManager.enableManagement by knowledge of the cache name.

The above holds for the Specification alone. I analyzed various implementations (RI, coherence, hazelcast, ehcache, infinispan, caffeine, cache2k,.... ) No implementation except Oracle Coherence is registering with the platform mbean server at the moment. I opened this issue for that: https://github.com/jsr107/jsr107tck/issues/83

cruftex commented 8 years ago

I think the issue will pop up again, however, I don't see an agreeable approach to address this in 1.x.

The problem was raised because there is the general question how annotation an imperative API interacts (or does not). This is further addressed by: https://github.com/jsr107/jsr107spec/issues/341

Tagging with JCache Next, to discuss and check back for a new cache standard. Closing anyway since we will currently not do something here.

gregrluck commented 7 years ago

Lets keep thinking about this. The current behaviour is very framework unfriendly.

ben-manes commented 7 years ago

@gregrluck Why were type tokens not adopted in the API? The class literal is very unfriendly and the problem with generics has been solved since 2006. Its been used widely (Guice, Jackson, Jax-rs, etc) and baked into other JVM languages to reify generics.

gregrluck commented 7 years ago

Ben, maybe because we didn't think about it. Can you add some more detail?

ben-manes commented 7 years ago

A good implementation to read is Guava's TypeToken. The details in Gafter's blog linked above is the original post regarding this technique.

In jax-rs this is called GenericEntity and GenericType. This allows a client to call to return a generic type, e.g.

List<User> users = client.request(MediaType.APPLICATION_JSON)
    .get(new GenericType<List<User>>() {});

Or in Guice a TypeLiteral could be used in a binding statement, e.g.

bind(new TypeLiteral<List<String>>() {}).toInstance(new ArrayList<String>());

For Jackson's TypeReference this is used similarly,

List<User> users = mapper.readValue(json, new TypeReference<List<User>>() {});

This same API design problem occurred in Cache2k, where I stepped in to offer the same advice. Jens then added a new builder to take advantage of this pattern,

Cache<Long, List<String>> cache = new Cache2kBuilder<Long, List<String>>() {}.build();

As you can see, it looks similar across libraries. While Java doesn't support List<String>.class, you can capture the generic type of the parent's class. The idiom is to force subclassing with an anonymous class. Usually one tries to not overuse it to avoid creating many classes, as it would pollute perm gen. Most APIs try to minimize usage in the APIs, e.g. Guice's provider method or jax-rs api method, because the class definition can be inspected. Both variations take advantage of how classes capture generic types. So one can often hide the details, but when it has to be passed in as a parameter then using "Gafter's Gadget" is necessary.

I would call this an advanced technique, but well worn after a decade of use. I personally like how Guice and Jackson use it, e.g. passing class literals when possible for the common case but allowing super type tokens as needed. I'd expect experienced Java developers to have used it, even if by copying a library tutorial. In Scala this is called ClassTag (replacing Manifest), in Kotlin reified type parameters, and similar across other JVM languages.

brianoliver commented 7 years ago

We actually looked at and prototyped this quiet a bit, including using TypeLiterals etc.

There are numerous challenges, the biggest being that none of these things work very well across a distributed implementations. Given JCache is trying to be topology and implementation agnostic, the use of anonymous and non-serializable TypeLiterals for "in-process" cache implementations would essentially prevent distributed cache implementations providing the same solution. More specifically, while we could make this work for the RI, no distributed cache vendor could make this work for distributed / multi-process implementations in a meaningful manner.

Here's an example: Consider two independent applications that share a common "foo" distributed cache. One application may define the "foo" cache as Cache<String, Person> and another may define the "foo" cache as Cache<String, Employee>. Both are completely legal definitions, but how are the types compared across process boundaries? Do TypeLiterals actually help?

Based on our experiments, it turns out they don't. We're in fact no better off that using regular types as the anonymous TypeLiterals not only need to be passed (serialized) across process boundaries, they also need to be comparable in order to ensure type compatibility/assignability.

Things get even more complicated if one application simply wants to read from a Cache, say by defining the cache as Cache<String, ? super Employee>. TypeLiterals can't solve this challenge (either), which basically means they are providing very little benefit at all. The Java Language allows these types of declarations and are clearly compatible, but TypeLiterals actually make such declarations almost impossible. Hence we've "erred" on the side of making it "just-safe-enough" without trying to work around the Java type system.

Ultimately things like TypeLiterals are an attempt to work around the lack of Reified Types in the Java Platform.

In someways, we probably would have been better not having this capability at all. That way we could leave it up to the developer to ensure type-safety, typically by convention, as JPA basically does.

There's no easy answer. What works for in-process Caches almost certainly doesn't work for multi-process/distributed Caches and vice-versa. Worse, TypeLiterals can make things more complicated than they need to be without them.

gregrluck commented 7 years ago

To be usable the specification should support use by frameworks and third party applications working with JCache. We should provide a way to interrogate a CacheManager, retrieve caches and use them.

You can call cache.getCacheConfiguration(). This gives rich information e.g. cacheConfiguration.getKeyType and just what a framework would need. However if runtime types have been set, you cannot get the cache in the first place. getCache(String name) will throw an IllegalArgumentException.

Now, they could use CacheMXBean.getValueType() and CacheMXBean.getKeyType() methods. This however requires that management is enabled and is awkward.

I propose we should remove the runtime checking from <K, V> Cache<K, V> getCache(String cacheName);. If you want it you can use <K, V> Cache<K, V> getCache(String cacheName, Class keyType, Class valueType);

This is a permissive change which will not break end user code. Implementations will have to change to drop type checking here.

The negative consequences are small. If you try to put the wrong types in when you mutate the cache runtime checking will still be enabled. You will get a ClassCastException.

I do not see a problem applying this to 1.1.

JCP 2.0 states: "All changes proposed by the ML shall make their way into the Specification either through the Maintenance Release process (described below) or through a new JSR. Changes appropriate for a Maintenance Release include bug-fixes, clarifications of the Specification, changes to the implementation of existing APIs, and implementation-specific enhancements. Changes introduced in Maintenance Releases – for example, modifications to existing APIs or the addition of new APIs - must not break binary compatibility as defined by the Java Language Specification. Changes that would break binary compatibility should therefore be deferred to a new JSR."

This change fixes an interoperability bug.

IllegalArgumentException would be removed. This is a RuntimeException and would not break binary compatibility.

gregrluck commented 7 years ago

Spoke to Brian about this today and reminded him of the problem and walked him through the above. He agrees and said that Coherence has already added this relaxation.

Plus I see we have thumbs up and no thumbs down.

So going ahead with the change.

jerrinot commented 7 years ago

@gregrluck isn't the old behavior enforced by TCK?

gregrluck commented 7 years ago

Yes it was. I also changed the RI to not check it and the TCK test to check that it was not throwing an IllegalArgumentException. All committed and pushed.

I have recorded the change in the changelog for the Spec because it is a spec change. See https://docs.google.com/document/d/14BvLUyv5U3zhsbIGwA4hkLfFieBkaCPfaHazh_yQGSY/edit page 145.

jerrinot commented 7 years ago

cool, thanks for explanation!

brianoliver commented 7 years ago

As mentioned by @gregrluck, Coherence allows both strict and a more relaxed type checking semantics. There are situations where you require both, so relaxing the spec and tck shouldn't be a problem. Existing applications expect strict type checking. They will still work and will be binary compatible. Applications using JCache 1.1 will be able to take advantage of relaxed type checking.

The best of both worlds!