google / guava

Google core libraries for Java
Apache License 2.0
50.14k stars 10.89k forks source link

LoadingCache.getIfPresent(key) returns null after LoadingCache.getUnchecked(key) returned non null value #2131

Open alvinlin123 opened 9 years ago

alvinlin123 commented 9 years ago

I am using the LoadingCache in a highly concurrent system and I observed some behavior that I am not sure is intended (to me it looks like a bug :) ). I wrote the following sample code to illustrate the problem :

public static void main(String[] args) {
        final LoadingCache<String, Object> cache = 
            CacheBuilder.newBuilder()
                                 .expireAfterAccess(65, TimeUnit.MINUTES)
                                 .build(new CacheLoader<String, Object>(){

                                                    @Override
                                                     public Object load(String arg0) throws Exception {
                                                                Object obj = new Object();
                                                                    System.out.println("creating object: " + obj);
                                                                    return obj;
                                                     }
                                  });

        int threadCount = 600;
        final String key = "hello/world";
        Runnable task = new Runnable() {

            @Override
            public void run() {
                try {
                    Object valueFromUnchecked = cache.getUnchecked(key);
                    if (valueFromUnchecked == null) {
                        System.out.println(Thread.currentThread().getName() + " valueFromUnchecked is null!!!");
                    }

                    Object value = cache.getIfPresent(key);
                    if (value == null) {
                        System.out.println(Thread.currentThread().getName() + " value is null!!!");
                    }

                    if (value != valueFromUnchecked) {
                        System.out.println(String.format(Thread.currentThread().getName() + "valueFromUnchecked:%s, value:%s", valueFromUnchecked, value));
                    }
                } catch (Exception e) {
                    e.printStackTrace();
                }
            }
        };

        for (int i = 0; i < threadCount; i++) {
            Thread t = new Thread(task);
            t.setName("thread-" + i);
            t.start();
        }

Sometimes thing are fine, I don't get the output of " value is null!!!" lines and some times I do. I never get the output of "valueFromUnchecked from getUnchecked is null!!!"

My understanding of from LoadingCache java doc, is that if getUnchecked() does not return null then getIfPresent() shouldn't return null as well; given that the cache has not expired yet.

ben-manes commented 9 years ago

A cursory glance...

It seems like final field visibility rules aren't working as expected, since that should avoid reordering and a SVR should never return null. Probably needs to be tested on JDK6/7 for good measure. Nothing in the code looks obviously wrong fence-wise.

ben-manes commented 9 years ago

Actually the statement on SVR is wrong and just some confusion when debugging. The problem is related to the if (count != 0) guard clause. If the cache is pre-populated with a different entry then the test passes.

alvinlin123 commented 9 years ago

thanks for looking into this. Just FYI I compiled the above code into jvm 7 class file (because my prof system runs jvm 7). But I did run the code in both jvm7 and jvm8 both produced same result.

ben-manes commented 9 years ago

Okay, so what is happening is that when LoadingValueReference#waitForValue() returns the count = 0. This is because the computing thread doesn't increment it until after its published through the value reference. This lets the reader see the value immediately, but before the count changes. Then when the reader immediately performs a getIfPresent the count is still zero and it receives null due to an early exit. When the computing thread finally increments the count then everything works normally. This means that one could argue that the expected happens-before relationship is being violated.

The best approach would be to simply remove the guard conditions. I don't remember when they were introduced. Most likely Bob added them in the original computing map and, since we inherited that, we never thought about his optimization. I don't think it buys anything and a load barrier shouldn't be required for correctness.

Sorry that this wasn't immediately obvious. Its been a very long time since I've reasoned through this code. Unfortunately I don't know if this will be fixed will be fixed any time soon.

ben-manes commented 9 years ago

Actually the guards might have been okay when Bob added them, and an oversight when we introduced LoadingCache. When we used the Map interface there was no way to peek into the cache, as a get was computing. So that's when it was probably introduced to mirror the style in other methods and we didn't consider that interaction.

cgdecker commented 9 years ago

From what I can tell, the attempt to fix this in commit bb5673f60ce9dcda1c228b0d0e2ce2485468f965 didn't work. I tried the sample program provided and am still seeing the issue. I'm looking into it now.

cgdecker commented 9 years ago

So, the count != 0 check doesn't seem to have any bearing on the behavior here, because the count being incremented is only an indicator that the loaded value has been stored (that is, that the entry's LoadingValueReference has been replaced with a StrongValueReference, in this case.) Even with the check removed, the code ends up calling LoadingValueReference.get(), which returns null.

The general issue seems to be that get[Unchecked] is allowed to return as soon as it can get the loaded value. Meanwhile, the thread that's actually doing the loading gets the loaded value and then must store the loaded value. Meanwhile, other threads that were waiting on the value loading can call getIfPresent, racing with the storing of the value.

I can see a couple possible ways to fix this:

  1. Don't let threads blocking on the loading of the value see the value until it's been stored. This seems like (potentially?) a fairly invasive change, though I haven't tried it yet.
  2. LoadingValueReference.get() could be changed to check if its futureValue future is done, returning the value from it if so on the assumption that the value, having loaded successfully, is in the process of being stored. This is a very simple change and fixes the issue for the example program given, but I'm not certain it's a correct change.

It's also not entirely clear to me that this is something that should be changed, weird though it may be. What's the value in calling getIfPresent for a key on the same thread that already called get[Unchecked] on it?

I'm going to spend a bit more time looking at this today, but I'm likely going to push any changes back to 20.0 since this isn't a simple matter of removing a count != 0 check.

ben-manes commented 9 years ago

I can't reproduce it with master and the test program. Ran with TestNG @Test(threadPoolSize = 10, invocationCount = 1000). I haven't looked into your analysis though.

ben-manes commented 9 years ago

Sorry, had a typo. reproduced.

cgdecker commented 9 years ago

I've got a solution in the works... it looks like it's not TOO invasive to make blocking gets only return after the value has been stored.

alvinlin123 commented 7 years ago

so ... when will the fix of the issue be release? looks like it was intended, then removed from 20.0 release?

uttamtiwari commented 6 years ago

I am also facing similar issue, can any one tell me which version I should use where the fix has been applied? However I am not calling getUnchecked I am directly calling getIfPresent function and seeing null value sometimes.

su1234rita commented 2 years ago

Any update on this fix? I am also facing a similar issue where getIfPresent returns a null sometimes.

cgdecker commented 2 years ago

I think it's unlikely we'll fix this at this point, though if someone wants to contribute a fix we'd be happy to look at that.

Note that we recommend people use Caffeine instead of Guava's cache; it's based on Guava's cache APIs but maintained by @ben-manes who has far more caching expertise than anyone on the Guava team these days.

ben-manes commented 2 years ago

Unfortunately I don't have the time to assist in maintaining Guava's Cache, and like the other coauthors, have moved on. I would prefer to have the difference between libraries be primarily performance and features for new code, rather than have users deal with bugs in their existing code. I can help review patches if anyone is interested in sending bug fixes.

I am familiar with the code base but Charles and I inherited it from Bob, where it was already very complex by the time we started adding / rewriting the caching logic. Unfortunately we didn't realize the linearization problems early enough so there were some original sins, likely further mistakes introduced later on, and we were both 20%ers (with Charles going full time for a short period). For example how to handle explicit writes for an in-flight load is not linearized (puts stomp the load whereas removals no-op, iirc). When writing narrow portions from the perspective of a transient best-effort cache the issues seemed benign, but taking a step back it's clearly wrong. The overwhelming amount of code due to forking ConcurrentHashMap makes it hard to wade through. Caffeine benefits from decorating the map, the addition of Java 8's computes, being much more test driven, and learning from these past experiences.

I wrote a simple Lincheck test below that could be extended and drive iterations on fixes for these linearization problems. However as @cgdecker mentioned since this cache is in maintenance mode, if possible then please try Caffeine where I've tried to be more diligent on correctness.

Lincheck Test ```java import org.jetbrains.kotlinx.lincheck.LinChecker; import org.jetbrains.kotlinx.lincheck.annotations.Operation; import org.jetbrains.kotlinx.lincheck.annotations.Param; import org.jetbrains.kotlinx.lincheck.paramgen.IntGen; import org.jetbrains.kotlinx.lincheck.strategy.stress.StressOptions; import org.jetbrains.kotlinx.lincheck.verifier.VerifierState; import org.testng.annotations.Test; import com.google.common.cache.CacheBuilder; import com.google.common.cache.CacheLoader; import com.google.common.cache.LoadingCache; @Param(name = "key", gen = IntGen.class, conf = "1:10") public final class GuavaLincheckTest extends VerifierState { private final LoadingCache cache; public GuavaLincheckTest() { cache = CacheBuilder.newBuilder().build(CacheLoader.from(k -> -k)); } @Operation public Integer getIfPresent(@Param(name = "key") int key) { return cache.getIfPresent(key); } @Operation public Integer getUnchecked(@Param(name = "key") int key) { return cache.getUnchecked(key); } @Test public void stressTest() { var options = new StressOptions() .iterations(100) .invocationsPerIteration(10_000); new LinChecker(getClass(), options).check(); } @Override protected Object extractState() { return cache.asMap(); } } ```