Closed rulex123 closed 4 months ago
Given that this library doesn't do traditional file IO, there would be very little benefit in introducing async throughout the code, which would be necessary to support this. It would likely lead to the library being slower overall or for there to be two parallel implementations of a lot of the code.
I don't know what your cache implementation looks like, but I think that async would probably only help if you are persisting the cache to disk, which seems unlikely to be beneficial in most cases due to the overhead. Are you sure it isn't a locking issue?
Perhaps try the advice in the FAQ entry Write Contention. There are two simple tricks that can resolve lock contention within ConcurrentHashMap.
The simplest is if your cache is fairly small then you might be suffering from hash collisions when it locks the hashbin. Internally the map scales the number of locks (and therefore its write performance) by the total size. A small map that is always discarding entries doesn't grow to handle the write concurrency, and conflicts of sharing a busy neighbor adds latency. The fix is to increase the initialCapacity
of the table so that entry locking won't interfere with each other.
The other approach is decouple the computation from the hash table. That's what async cache does, but you can perform the work on the caller rather than waste threads unnecessarily. Instead you insert a pending future, compute it, and return the result.
AsyncCache<CacheKey, DecodedValue> cache = Caffeine.newBuilder().buildAsync();
DecodedValue get(CacheKey key, Loader loader) throws IOException {
var future = new CompletableFuture<DecodedValue>();
var prior = cache.asMap().putIfAbsent(key, future);
if (prior != null) {
return prior.join();
}
try {
var value = loader.load(key);
future.complete(value);
return value;
} catch (Throwable t) {
future.completeExceptionally(t);
throw t;
}
}
@ben-manes, awesome. Thanks for chiming in.
I'll go ahead and close this as there doesn't seem to be anything to do in this library.
Thank you for your suggestions @ben-manes! I have tried out the 2nd approach and it helped 🎉
We have a Java class (called
GeoIpService
) that is a wrapper around https://github.com/maxmind/GeoIP2-java and exposes a simple API to lookup up geo information for a given IP (e.g. lookup city, lookup country etc.). This class in turn usesDatabaseReader
to read from various maxmind databases, and sets up caching for each database used (we have an implementation ofNodeCache
that uses Caffeine under the hood).The
GeoIpService
class is distributed internally via a library and is used by several high-traffic microservices that have a need to geolocate an IP address. The issue we observe under production load is high numbers of blocking calls for our implementation ofNodeCache.get(key, loader)
, which in turn calls https://github.com/ben-manes/caffeine/blob/a03e95ddc69e03e2ca205b9d2ed08c89f3235a32/caffeine/src/main/java/com/github/benmanes/caffeine/cache/Cache.java#L55-L81.Caffeine provides async caching, however we can't really switch to that atm because we are limited by the current
NodeCache
interface.Could caching support for maxmind databases be extended to include an async interface?