jsr107 / jsr107spec

JSR107 Cache Specification
Apache License 2.0
415 stars 164 forks source link

EntryProcessor Requirement? #124

Closed gregrluck closed 11 years ago

gregrluck commented 11 years ago

From Spec Leads + Terracotta

The EntryProcessor functionality seems strange in two ways. Firstly, and most fundamentally this seems like a feature that doesn't belong on a cache, but would instead be better suited to a data grid, or a store product. Last year I studied the use of the term “data grid”.  > Caches are designed to cache an authoritative data source in that sense it seems bizarre to have any mutative methods that aren't focussed on keeping the cache in sync with it's authority.

I understand that many of the concurrent map like methods here also fall into that camp ­ but they don't have such an implementation overhead. On the implementation overhead front, the way that the API is currently constructed means the implementor has a bizarre contract to meet ­ make the execution of this big method that might perform multiple arbitrary operations atomic (probably going to require locking) ­ and the user has a bizarre and hard to express set of limitations too ­ basically don't do anything that would trigger a deadlock (no interaction with any caches).

A simpler approach to me would be to do something like what is supported in JDK 8 now:

AtomicReference.updateAndGet(UnaryOperator<V> updateFunction);

This puts a simple constraint on the user (the method must be nullipotent aka "side­effect­free") ­ and can be implemented in a much more performant way (CAS loop).

brianoliver commented 11 years ago

From Spec Leads

The EntryProcessor functionality seems strange in two ways. Firstly, and most fundamentally this seems like a feature that doesn't belong on a cache, but would instead be better suited to a data grid, or a store product. Last year I studied the use of the term “data grid”.  Surprisingly I found that most vendors who we put in this category do not self­describe this way. The term data grid came from the scientific community and has a pretty weird and poorly defined meaning in Java middleware. The cache API we are defining is for enterprise use. The use cases for entry processors are predominantly for performance optimisations when mutating cache entries. Their atomic nature typically eliminates network round trips in a distributed system.

Take an example of appending a single item to ArrayList with a large entries in it. I have seen this use case as reported by Kunal Bhasin. In Terracotta you need to bring the entry across the network, append to it, then put it back.

Another is to add an item to a shopping cart. Rather than bring the complex structured object across the network, update it where it is.

Coherence has standard binary implementations of the EntryProcessors which work with POF listed here: http://docs.oracle.com/cd/E24290_01/coh.371/e22843/com/tangosol/util/processor/package­su mmary.html These do not require Java deserialization. Take the NumberIncrementor as an example:

In JSR107 this would look like this:

cache.invoke(key, new NumberIncrementor(), "person.age", 1));

and a person exists in a JSON structure in an Element value.

We then use person.age to navigate in the JSON object, read the value and increment it atomically all without requiring ClassLoader or Java Serialization on the server.

Obviously this is not how EHCache works now, but we could do these types of optimisations around restricted object types and operations in the future.

As another example take a large entry in the cache representing a set of tuples or restricted types. We can perform server­side analytics functions in the future using EntryProcessor. Say computing a sum. I am aware of two analytics engines that store data in us and other competitors this way. Once again this can be done without classloaders and can make an enormous difference.

The fallback is always to bring the data locally and execute it. In fact Coherence does this on the client in standard edition and only does it on the server in the Grid or Enterprise additions as performance value add.

Caches are designed to cache an authoritative data source in that sense it seems bizarre to have any mutative methods that aren't focussed on keeping the cache in sync with it's authority.

The Spec Leads fundamentally disagree with this. Caches by definition are a copy of data for the purposes of reducing latency (keeping a copy of information closer to the application that uses the data instead of paying the cost of access/retrieval/calculation), and/or for improving application through­put of processing. There’s no requirement for any Cache to be bound to an authoritative data source. For example: An internet browser “cache” is not designed or mandated to stay in sync with the original source of web content (aka: server containing the authorative data). While there may be some scenarios in which a Cache may, should, ideally or must be kept “in sync” with one or more authoritative sources, Caches are not required to do this.

The purpose of an Entry Processor is to allow atomic, low­ latency update of a Cache Entry. Keeping an entry in a cache in sync with an underlying authority (system of record) is orthogonal to the purpose of Entry Processors.

Entry Processors are not about making JSR­107 into a Data Grid. It’s about providing developers with the ability to programmatically update Cache Entries, typically in terms of their business objects, atomically, without doing something like the following:

cache.lock(key);
V value = cache.get(key); 
value.update(); 
cache.put(key, value); 
cache.unlock(key);

Compare this with something like:

cache.invokeEntryProcessor(key, new ValueUpdater(value);

I understand that many of the concurrent map like methods here also fall into that camp ­ but they don't have such an implementation overhead. On the implementation overhead front, the way that the API is currently constructed means the implementor has a bizarre contract to meet ­ make the execution of this big method that might perform multiple arbitrary operations atomic (probably going to require locking) ­ and the user has a bizarre and hard to express set of limitations too ­basically don't do anything that would trigger a deadlock (no interaction with any caches).

Completely disagree. Coherence and other products have been providing this type functionality, without “bizarre and hard to express” limitations for nearly 10 years.

A simpler approach to me would be to do something like what is supported in JDK 8 now:

AtomicReference.updateAndGet(UnaryOperator<V> updateFunction);

This puts a simple constraint on the user (the method must be nullipotent aka "side­effect­free") ­ and can be implemented in a much more performant way (CAS loop).

Right. Coherence, as does the JSR, mandates that EntryProcessors must be idempotent.

yannis666 commented 11 years ago

I find the entry processor essential to perform (simple) atomic operations without requiring the API to expose a complex locking schema. If you require something equivalent to int inc(key) { int i = map.get(key, value) + 1; map.put(key, i) ; return(i); } and wanted an atomic operation it would not be possible (in the absence of EP) without exposing a locking api.

Personally I would be much happier hearing a discussion about removal of methods from Cache since many (most?) could be easily implemented using simple EPs.

EPs are a very simple and elegant way to provide for simple atomic operations on a cache

Cotton-Ben commented 11 years ago

EntryProcessor is fundamental to this JSR being attractive to end-users. Users deserve to know that this minimum capability is present from any/all providers claiming to deliver a Java standard Caching solution. UNBURDEN THE USER FROM DOING THE DIRTY WORK. Empower the user with a graceful API (right you are Yannis) to solve problems they will inevitably be faced with tackling. EntryProcessor is a must. To not provide makes JCACHE look lazy and simple..

Cotton-Ben commented 11 years ago

Entry Processors are not about making JSR­107 into a Data Grid. It’s about providing developers with the ability to programmatically update Cache Entries, typically in terms of their business objects, atomically [...]

It is absolutely essential that this capability be something the end-user can expect as JCACHE standard.

gregrluck commented 11 years ago

Agree with Yannis and Ben's points above.

gregrluck commented 11 years ago

It looks like we are all in agreement here. Closing this one for now.