jsr107 / jsr107spec

JSR107 Cache Specification
Apache License 2.0
413 stars 164 forks source link

Cache.iterator / Iterator.next() should not return null #345

Open cruftex opened 8 years ago

cruftex commented 8 years ago

The current JavaDoc is:

  /**
   * {@inheritDoc}
   * <p>
   * The ordering of iteration over entries is undefined.
   * <p>
   * During iteration, any entries that are a). read will have their appropriate
   * CacheEntryReadListeners notified and b). removed will have their appropriate
   * CacheEntryRemoveListeners notified.
   * <p>
   * {@link java.util.Iterator#next()} may return null if the entry is no
   * longer present, has expired or has been evicted.
   *
   * @throws IllegalStateException if the cache is {@link #isClosed()}
   */
  Iterator<Cache.Entry<K, V>> iterator();

Proposed Change / next() may yield null

Current statement:

{@link java.util.Iterator#next()} may return null if the entry is no longer present, has expired or has been evicted.

This is against the iterator interface contract. If implementations really do this, all applications would be very error prone, since there are no null checks. Suggest removal of this sentence.

Implementations need to check for expiry before returning true to hasNext() and then might return an expired entry on the call to next(). That is a conceptional problem in the iterator interface. But, since there is always time between the check in the cache and the further processing in the application it does not matter.

Either remove or add for clarification:

Cache implementation may not return expired entries, that expired just before the call to hasNext()

cruftex commented 8 years ago

After the last edit we have at the moment:

When iterating over a cache it must be assumed that the underlying cache may be changing, with entries being added, removed, evicted and expiring. {@link java.util.Iterator#next()} may therefore return null.

We need to remove that.

gregrluck commented 7 years ago

Perhaps it does, but Brian and I thought long and hard about this. You also do not have anyone else weighing in on this topic.

You are also creating pull requests to change the contract but not creating changes to the tests to confirm that the implementation is doing the right thing. Any pull request that changes the contract must have a TCK test for it.

brianoliver commented 7 years ago

I think an example may help explain why we thought returning null was not a good idea.

Imagine 10 cache entries, all of which expire after you requested an iterator.

What should the semantic be? Should the application iterate over 10 null entries? What use would that be to an application? Without knowing the key, perhaps to tell what has expired, what value would a null provide? If an implementation has to provide the keys and just null values, that would force an implementation to keep keys alive for sometime in the cache after entries have expired. This would be extremely inefficient, especially in a distributed system.

cruftex commented 7 years ago

@brianoliver:

Should the application iterate over 10 null entries?

I don't understand your comment. No it shouldn't. This is why I opened the issue.

It is not defined what happens when an entry expired after the iterator was requested. None or some entries may be returned. But, IMHO, never a null reference, because this would break the general usage pattern of the iterator.

@gregrluck:

Perhaps it does, but Brian and I thought long and hard about this. You also do not have anyone else weighing in on this topic.

There isn't much community activity in the other tickets as well. It might also be an indicator that there is no interest in an iterator at all. Personally, I did never need it and I am not a fan of it.

You are also creating pull requests to change the contract but not creating changes to the tests to confirm that the implementation is doing the right thing. Any pull request that changes the contract must have a TCK test for it.

There is no need for it. We don't do concurrent tests and the other existing tests expect that Iterator.next() is never returning a null. E.g. CacheTest.iterator():

  @Test
  public void iterator() {
    LinkedHashMap<Long, String> data = createLSData(3);
    cache.putAll(data);
    Iterator<Cache.Entry<Long, String>> iterator = cache.iterator();
    while (iterator.hasNext()) {
      Cache.Entry<Long, String> next = iterator.next();
      assertEquals(next.getValue(), data.get(next.getKey()));
      iterator.remove();
      data.remove(next.getKey());
    }
    assertTrue(data.isEmpty());
  }
chrisdennis commented 7 years ago

The reason no one is complaining about it is that it's overly permissive, not overly restrictive. Implementors will do the right thing and be strict about not returning null (and therefore still be compliant). Most users won't even realize the spec allows this behavior because there only experience will be in using the spec through one of the implementations.

cruftex commented 7 years ago

The reason no one is complaining about it is that it's overly permissive, not overly restrictive.

Chris, this reasoning is solely from the side of the implementor. But the contract is for both, users and implementations.

The aim of the standard is compatibility. With this contract, every JCache compliant application has to check for nulls, to be compatible with every legal JCache implementation. Does it?!

chrisdennis commented 7 years ago

@cruftex You misunderstand me, I agree with you. I'm simply trying to explain to Greg the likely reason for nobody complaining about it.

This situation is akin to ISO defining a standard for doors without requiring handles. Door manufacturers just add handles in spite of the standard not requiring it, and the users just see handles everywhere because no one would install a door without a handle. Everyone is happy (users and door manufacturers), except for the guys who insist on walking around with a bag of handles just in case they need to go through a strictly adherent ISO standard door.

cruftex commented 7 years ago

Got ya. Thanks for the entertaining example. Saves my evening :)

brianoliver commented 7 years ago

Thanks lively discussion @chrisdennis and @cruftex.

Some of the most challenging parts of designing the JCache API and then implementing the TCK has been dealing with issues where there isn't a universally acceptable answer. In these circumstances we've basically had no choice to leave parts of the behavior permissive in nature.

Of course there are also places which we've obviously missed and should be resolved without impacting applications or implementations (because there are at least 8 of them now).

eg: In many places a solution for an "in-process" cache implementation will often contradict the requirements and jeopardize the efficiency/scalability etc of distributed implementations and vice-versa) and thus we have no option but to leave things permissive.

I'm certainly happy to add some clarifications around this, but given that iterating over cache entries is something virtually no application I've seen does, it's seems to be a low priority issue to resolve.

gregrluck commented 7 years ago

Cannot seem to close this so moving it to 2.0