memory leak on LRUCache

andre-pt commented 5 months ago

First of all, I'm aware that LRUCache belongs to java-util, but it seems to be more relevant in this project.

Problem

We have been upgrading json-io from 4.14.0 to 4.24.0 and when we got to production, we found out a constant increase of heap size. After capturing some heap dumps, we narrowed the cause to LRUCache and LinkedHashMap:

Even though I was not able to reproduce the issue locally, I suspect it's related with the high concurrency environment we use json-io. In peak time, we can deserialize up to 500 times per second using hundreds of threads.

In the following screenshot, we have the heap of a service before and after the upgrade:

As a mitigation and to validate the theory, we forked json-io locally and replaced LRUCache with Caffeine:

class JsonParser {
    private static final JsonObject EMPTY_ARRAY = new JsonObject();  // compared with ==
    private static final Cache<String, String> stringCache = Caffeine.newBuilder()
            .maximumSize(2500)
            .build();
    private static final Cache<Number, Number> numberCache = Caffeine.newBuilder()
            .maximumSize(2500)
            .build();

I know this is not a proper solution, because json-io wants to be independent of third party dependencies, this is a temporary solution to unblock us. We went to production 1h ago and it's too soon to see results, I will share as soon as I can.

Possible solutions

I'm not sure I understand 100% the reason for these 2 caches in the JsonParser, but I assume there was a valid reason for it :) I think that 2500 entries cache might not bring a lot of value in our use case where we have 100s KB json documents. Could you confirm that?

Anyway, I have the feeling it would be great to have the cache configurable. The way I thought it could work was to have it as part of the ReadOptions instead of a static property of JsonParser. This way we could even configure different kind of caches depending the use case with in the same JVM. Having it configurable, users could bring their own implementation (Caffeine for instance), instead of relying on LRUCache or even having a NoOpCache implementation to disable it.

What's your thoughts on it?

Thanks a lot for your work on this library!

ben-manes commented 5 months ago

It looks like LRUCache makes the innocent mistake of treating Map.get(key) as a read api under a read-write lock. This is surprisingly not the case for an LRU where every read is a write that modifies the internal structure. The class JavaDoc bolds this but it is often overlooked and a common mistake (I too made it): In access-ordered linked hash maps, merely querying the map with get is a structural modification.

jshell> var map = new LinkedHashMap<String, Integer>(16, 0.75f, /* accessOrder */ true)
map ==> {}

jshell> map.putAll(Map.of("a", 1, "b", 2, "c", 3));

jshell> map
map ==> {c=3, a=1, b=2}

jshell> map.get("a")
$4 ==> 1

jshell> map.toString()
$5 ==> "{c=3, b=2, a=1}"

jshell> map.get("c")
$6 ==> 3

jshell> map.toString()
$7 ==> "{b=2, a=1, c=3}"

Simple fixes would be to set the accessOrder flag to false for insertion order (FIFO) or make the Map.get(key) method use the write lock. Likely the read/write lock is for performance, but for a simple in-memory lookup the overhead of that lock type usually exceeds any benefit. Thus, Collections.synchronizedMap should have similar performance with less code complexity. There are many schemes for a concurrent cache, but if the performance of the read/write lock was acceptable then you will probably be happy enough with a synchronized one and can avoid bothering with anything fancier. Making it a pluggable factory further defers you having to think about it in your own code base.

jdereg commented 5 months ago

I had been traveling the last week and haven't had time to look into this. I appreciate the detail research you put into this. I will get this incorporated, likely by the end of this week.

jdereg commented 5 months ago

This has been corrected in json-io 4.25.0. First, LRUCache in java-util has been updated to remove the possible memory leak. Second, the caches in JsonParser are now instance-based, not static. The reason for the caches, and to your question of why 2500 elements, is that when parsing a large JSON file, it saves memory when you can re-use the same Java instance of a paricular String, Number, etc. After parsing into RAM, you do not want a bunch of "true" Strings on the heap (or Double 1.0), but instead you want one instance, and the others to reference that same instance. This is only done for immutable items, so it saves memory and there is no risk of one of the values being modified and not seeing the modification reflected elsewhere.

ben-manes commented 5 months ago

Since moveToHead is guarded by only a read lock, multiple readers could try to reorder in the doubly linked list. These updates are not atomic and could be interleaved to create a cycle or other corruption. In your tests, you should validate this list is correct which I believe will show it is still an unsafe implementation.

jdereg commented 5 months ago

I created a new java-util 2.11.0. Please take a look at LRUCache. I re-wrote it to use a ConcurrentHashMap that maintains the nodes, and then uses a background thread (executor service) to queue cleanup requests after put() so that it does not delay get(), put(), or remove(). These methods operate as they do in ConcurrentHashMap. The put() api schedules a cleanup request. Also, it carefully does not schedule bunches of cleanup requests only 1 beyond anyone running. More LRUTests added.

ben-manes commented 5 months ago

This will work. I personally dislike creating new threads in a library, especially if there is no way to shut it down. It can cause issues like not allowing classloaders to be GC'd (a classic problem on Tomcat). In your approach here, I'd use ForkJoinPool.commonPool() and, if you want to schedule it, CompletableFuture.delayedExecutor.

However, I'd recommend reverting to your previous version and switching the locking. Instead of a read-write lock (aka shared-exclusive lock), use an exclusive try-lock. On a successful read from the ConcurrentMap, attempt to lock to perform the LRU reordering but skip it if not successful. A write must acquire it like before, but its work remains fast as O(1). The insight here is that you do not need strict LRU ordering and best-effort will select an equally good victim, so under heavy load your policy degrades to sample the read requests. This approach has excellent read performance as it is effectively lock-free (in my benchmark this had 290M reads/s and 19M writes/s on my M3 laptop).

private final Lock lock = new ReentrantLock();

public V get(Object key) {
  Node node = map.get(key);
  if (node == null) {
    return null;
  }
  if (lock.tryLock()) {
    try {
      moveToHead(node);
    } finally {
      lock.unlock();
    }
  }
  return node.value;
}

public V put(K key, V value) {
  lock.lock();
  try {
    ...
  } finally {
    lock.unlock();
  }
}

jdereg commented 5 months ago

Ben,

Thank you for your insights here. I am aware of who you are and truly appreciate your thoughtful comments (and time).

I just released java-util 2.12.0 which contains a much improved LRUCache inspired by your comments. The outer LRUCache still implements Map, and internally it delegates to the strategy (implementation) chosen. There are currently two strategies, a 'locking-based' strategy, which is inspired by your try-lock comments, and a 'threaded-based' strategy, which is the ConcurrentHashMap based implementation that schedules a thread to manage the clean-up. The strategy selected depends on the constructor used on LRUCache.

The LRUCacheTest has been updated to run against both strategies with parameterized testing.

Your questions and comments are always welcome. Thank you!

ben-manes commented 5 months ago

Oh I’m just happy to help. It’s been a fun hobby where I’ve made all these mistakes trying to figure out what seems like it should be trivially easy, but somehow isn’t.

For the try-lock approach, a slight bug is when the read races with an eviction for the same node. This is when the victim is read, thread is descheduled, it’s evicted, and then the reader awakes and acquires the try-lock. In the current code it will relink the entry which won’t reside in the map. You can check by nulling out the links and skipping the reorder if unlinked. This has an extra benefit of avoiding gc nepotism.

fwiw, the threading can still use the Executor interface for the cleanup pool. I am just hesitant at creating threads directly and defaulting to JDK pools side steps those problems.

Thanks for the prompt fixes and supporting the community.

jdereg commented 5 months ago

If you don't mind one last review, I made the changes you have highlighted and am poised to release it (they are committed to HEAD). This should fix the gc nepotism issue and simplifies the thread scheduling to only using one scheduler. If you know of a way to repeat the gc nepotism issue, please let me know.

java-util is intended to be a small library with no dependencies, while still maintaining JDK 1.8 compatibility yet contains the manifest for JDK 1.9+. Some of the useful utilities are DeepEquals, Converter, CompactMap, CaseInsensitiveMap, and DateUtilities parser (very tolerant).

Thank you in advance, John

ben-manes commented 5 months ago

lgtm

jdereg commented 5 months ago

Thank you.

andre-pt commented 5 months ago

I'm sorry for the radio silence from my side. I would like to thank you both for the collaboration. It's beautiful to see this on open sources projects!

@jdereg I see you changed the JsonParser to hold a LinkedHashMap instead of the LRUCache. Now that LRUCache has multiple strategies, it would be great to be able to choose them via ReadOptionsBuilder.

Just for my curiosity: what's the reason to have it so "closed" by having both strategies inside LRUCache, instead of having a Cache interface and multiple implementations of the same? In the latter the library user could even implement their own Cache and use it inside the JsonParser. I'm happy to do a PR for this change if you feel it makes sense.

jdereg / json-io

memory leak on LRUCache #284

Problem

Possible solutions