google / guava

Google core libraries for Java
Apache License 2.0
49.91k stars 10.83k forks source link

Cache.asMap() equality with dead entries #5961

Closed ben-manes closed 2 years ago

ben-manes commented 2 years ago

When reviewing Map's equality contract and the implementations in ConcurrentHashMap and AbstractMap, I am not sure if Guava and Caffeine honor this strictly enough.

The consistency property between invocations requires that the results are the same if there are no modifications to the information used. Therefore, usages should expect that this operation may return misleading results if either map or the data held is modified during the execution of this method. This characteristic allows for comparing the map sizes and assuming stable mappings, as done by AbstractMap-based maps.

The symmetric property requires that the result is the same for all implementations of Map.equals(Object). That is only defined in terms of the stable mappings provided by Map.entrySet(), which is all that ConcurrentHashMap uses in its determination. That leads to the size() optimization forcing that count be consistent with the returned mappings because it may or may not be used by JDK maps.

The cache's define their size() methods as including entries that have expired or been reference collected, but have not yet been removed from the map. An iteration over the map may trigger the removal of these dead entries when skipped over during traversal. To honor both consistency and symmetry the comparison must should only occur after a call to cleanUp() and the absence of concurrent modifications. Currently Guava and Caffeine both inherit the AbstractMap implementation.

A simple solution would be to implicitly call cleanUp() when Cache.asMap().size() is invoked, but would not be needed on Cache.size() as explicitly an estimate. That would resolve the discrepancy between the mappings and count, but change the cost by performing operations under a lock (which Guava may only do partially). That seems like a poor tradeoff, especially since users most often call the Map's size() for an int value rather than the cache interface's long. It would then seem appropriate to make the caveat that users must call cleanUp() themselves prior to calling equals, which is usually the advice we give casually anyway. This may not need to be documented in the interfaces, but probably a worthwhile stated assumption in the implementations rather than glossed over.

What do you think is best?

ben-manes commented 2 years ago

Closing as having documented this in the issue tracker is likely good enough if one had to dig in. After sorting out my thoughts, I migrated Caffeine off of AbstractMap for some minor efficiencies. That now includes an internal JavaDoc that is basically the above to describe the assumptions if using Map.equals.