imglib / imglib2-cache

Cache interfaces and java.lang.ref based implementation.
Other
6 stars 4 forks source link

Meaning of invalidate()/invalidateAll() #2

Closed tpietzsch closed 5 years ago

tpietzsch commented 7 years ago

I'm currently implementing SoftRefCache.invalidateAll().

We have to decide what invalidate(key) and invalidateAll() should actually do. (Of course, invalidateAll() it should be equivalent to calling invalidate(key) for all keys in the map.) invalidate(key) could mean several things:

Option 1: Purge with none of the usual guarantees: If the value is still referenced somewhere outside the cache it will be reloaded when the key is next requested (this might lead to two distinct objects with the same key). If the cache has a CacheRemover, it will not be notified.

Option 2: Purge with removal notification: If the value is still referenced somewhere outside the cache it will be reloaded when the key is next requested (this might lead to two distinct objects with the same key). Any CacheRemover will be notified.

Option 3: Weaken: If the value is still referenced outside the cache it will not be removed from the cache. Effectively, this means that whatever the internal storage of the cache is (guava, softref, ...) it is replaced by a WeakReference. The usual collection mechanism should then take care of GCing the value and notifying any CacheRemover.

I think all of these make sense in some scenarios and all have advantages and problems, e.g., Option 3 is safe with regard to references but it does not allow to force a reload of a particular entry (and this may be the goal of invalidate(key) actually).

(I'm implementing SoftRefCache.invalidateAll() as being compatible with Option 1 and 2 for now, but this is not set in stone...)

Are there any comments / preferences / opinions on this?

axtimwalde commented 7 years ago

@tpietzsch, option 3 does not sound particularly useful to me as it only changes the priority for future removal at an unknown point in time. I can immediately see option 1 and 2 to be useful and how I would use them. Notifying removers selectively is necessary do distinguish between the trigger of the event and potential listeners (e.g. consumerA notifies source about removal of modified element X, source calls invalidate(X) in consumerB, consumerB does NOT want to notify source that it has removed X.)

ctrueden commented 7 years ago

Would it make sense to have a (maybe optional) boolean notify flag for invalidate so that you can control whether the CacheRemovers get notified? Either that, or make invalidate and invalidateQuietly; we use this pattern in ImageJ Common for display event notification and it works OK, although it feels slightly hacky.

hanslovsky commented 7 years ago

Option 2 sounds most useful to me and the selective notify/flag mentioned by @axtimwalde and @ctrueden seem reasonable. Maybe, instead of boolean, it should be an int or an enum to allow to pass different states.

Just to make sure I understood correctly: In Option 2, any cached files files on disk would be deleted as well, right?

tpietzsch commented 7 years ago

Just to make sure I understood correctly: In Option 2, any cached files files on disk would be deleted as well, right?

Hmm, I hadn't thought of that, but that would be useful. The disk files are external to the "core" cache, so there would have to be something wrapping the two.

tpietzsch commented 7 years ago

@ctrueden I agree that it would be useful to two versions. To stay compatible with guava concepts and naming, I would prefer the version with invalidate and invalidateQuietly methods ( invalidate would behave as in guava, sending notifications).

hanslovsky commented 7 years ago

I just realized that there is one issue with deleting files on disk: We would have to wait for all deletions to be completed to ensure consistency. Maybe, instead of deleting, we should actually have a flag that indicates that a particular file on disk is not valid anymore and should not be loaded. After re-generating the data, this file could then be overwritten.

tpietzsch commented 7 years ago

@hanslovsky Coming back to your earlier question:

Just to make sure I understood correctly: In Option 2, any cached files files on disk would be deleted as well, right?

Strike my previous answer...

In Option 2, invalidate(key) notifies removal listeners. The disk cache is such a removal listener. So actually, this would be the opposite: It flushes the cached value to disk before deleting it to make sure that the disk cache has the latest state. And that would be pointless if you don't want to use the values on disk any more.

Deleting files on disk would actually be more useful with option 1, and also there you would not always want to do it. But one would have to notify the disk cache and the memory cache separately.

Maybe we can come up with a scenario where the deleting files on disk would be desired and then figure out how that could be best implemented?

hanslovsky commented 7 years ago

I have two use cases in mind:

  1. Caching data fetched through http or some other protocol might produce corrupted data.
  2. I have observed corrupted data in numpy based caches in imglyb. I would like to invalidate cells that are corrupted.

I am not sure if these are good examples because my lack of understanding what happens exactly in both cases. (1) could probably be handled by return code/exception/sanity checks within the loader that return a VolatileArray( numEntities, isValid = false ) in case of failure. This works only if the VolatileArray is not cached to disk if isValid == false. If that is the case, (1) should be a non-issue.

For (2) it might be desirable to selectivley invalidate accesses that are corrupted.

hanslovsky commented 6 years ago

I have more insight about how I would like to use the cache now as I am currently re-introducing painting functionality into the updated BigCAT and use DiskCachedCellImg as a canvas layer. In fact, paintable sources consist of three layers:

I would like to do two things with the cache:

  1. Accessing all pixels that have been painted into: As of now, I am tracking the bounding box of what has been painted and use that to determine what parts of both the mask and the canvas I need to persist. However, that can mean touching a lot more pixels than necessary. A better way would be to invalidateAll which would prompt the cache to persist all (dirty) cells to disk. With a lock (that can happen on BigCAT side) to make sure that the cache remains unchanged I could then just persist the cells that were modified, as all of them are persisted to disk.
  2. Remove all cells from cache (and disk). This is relevant for the canvas layer only, as I do not create a new DiskCachedCellImg after persiting into background.
hanslovsky commented 6 years ago

Update for my use cases: I do not use (1) as described in my comment from Jan 18 anymore. Instead, I create new disk caches whenever necessary. As far as I can tell, that is a reasonable solution. Use- case (2) for updating cached cell imgs (and other caches) remains very much desirable.

tpietzsch commented 5 years ago

"Invalidate" means: Remove from the cache, and discard any changes that have been made in this cache or any backing cache. In particular, if changes have been persisted to disk, remove those as well.

If the cache has a CacheRemover, that is notified, but via invalidate instead of onRemoval.

See javadoc in Invalidate, AbstractCache, and CacheRemover