Open alvinlin123 opened 9 years ago
A cursory glance...
Unsafe.fullFence()
)It seems like final field visibility rules aren't working as expected, since that should avoid reordering and a SVR should never return null. Probably needs to be tested on JDK6/7 for good measure. Nothing in the code looks obviously wrong fence-wise.
Actually the statement on SVR is wrong and just some confusion when debugging. The problem is related to the if (count != 0)
guard clause. If the cache is pre-populated with a different entry then the test passes.
thanks for looking into this. Just FYI I compiled the above code into jvm 7 class file (because my prof system runs jvm 7). But I did run the code in both jvm7 and jvm8 both produced same result.
Okay, so what is happening is that when LoadingValueReference#waitForValue()
returns the count = 0
. This is because the computing thread doesn't increment it until after its published through the value reference. This lets the reader see the value immediately, but before the count changes. Then when the reader immediately performs a getIfPresent
the count is still zero and it receives null due to an early exit. When the computing thread finally increments the count then everything works normally. This means that one could argue that the expected happens-before relationship is being violated.
The best approach would be to simply remove the guard conditions. I don't remember when they were introduced. Most likely Bob added them in the original computing map and, since we inherited that, we never thought about his optimization. I don't think it buys anything and a load barrier shouldn't be required for correctness.
Sorry that this wasn't immediately obvious. Its been a very long time since I've reasoned through this code. Unfortunately I don't know if this will be fixed will be fixed any time soon.
Actually the guards might have been okay when Bob added them, and an oversight when we introduced LoadingCache
. When we used the Map
interface there was no way to peek into the cache, as a get
was computing. So that's when it was probably introduced to mirror the style in other methods and we didn't consider that interaction.
From what I can tell, the attempt to fix this in commit bb5673f60ce9dcda1c228b0d0e2ce2485468f965 didn't work. I tried the sample program provided and am still seeing the issue. I'm looking into it now.
So, the count != 0
check doesn't seem to have any bearing on the behavior here, because the count
being incremented is only an indicator that the loaded value has been stored (that is, that the entry's LoadingValueReference
has been replaced with a StrongValueReference
, in this case.) Even with the check removed, the code ends up calling LoadingValueReference.get()
, which returns null
.
The general issue seems to be that get[Unchecked]
is allowed to return as soon as it can get the loaded value. Meanwhile, the thread that's actually doing the loading gets the loaded value and then must store the loaded value. Meanwhile, other threads that were waiting on the value loading can call getIfPresent
, racing with the storing of the value.
I can see a couple possible ways to fix this:
LoadingValueReference.get()
could be changed to check if its futureValue
future is done, returning the value from it if so on the assumption that the value, having loaded successfully, is in the process of being stored. This is a very simple change and fixes the issue for the example program given, but I'm not certain it's a correct change.It's also not entirely clear to me that this is something that should be changed, weird though it may be. What's the value in calling getIfPresent
for a key on the same thread that already called get[Unchecked]
on it?
I'm going to spend a bit more time looking at this today, but I'm likely going to push any changes back to 20.0 since this isn't a simple matter of removing a count != 0
check.
I can't reproduce it with master and the test program. Ran with TestNG @Test(threadPoolSize = 10, invocationCount = 1000)
. I haven't looked into your analysis though.
Sorry, had a typo. reproduced.
I've got a solution in the works... it looks like it's not TOO invasive to make blocking gets only return after the value has been stored.
so ... when will the fix of the issue be release? looks like it was intended, then removed from 20.0 release?
I am also facing similar issue, can any one tell me which version I should use where the fix has been applied?
However I am not calling getUnchecked
I am directly calling getIfPresent
function and seeing null value sometimes.
Any update on this fix? I am also facing a similar issue where getIfPresent returns a null sometimes.
I think it's unlikely we'll fix this at this point, though if someone wants to contribute a fix we'd be happy to look at that.
Note that we recommend people use Caffeine instead of Guava's cache; it's based on Guava's cache APIs but maintained by @ben-manes who has far more caching expertise than anyone on the Guava team these days.
Unfortunately I don't have the time to assist in maintaining Guava's Cache, and like the other coauthors, have moved on. I would prefer to have the difference between libraries be primarily performance and features for new code, rather than have users deal with bugs in their existing code. I can help review patches if anyone is interested in sending bug fixes.
I am familiar with the code base but Charles and I inherited it from Bob, where it was already very complex by the time we started adding / rewriting the caching logic. Unfortunately we didn't realize the linearization problems early enough so there were some original sins, likely further mistakes introduced later on, and we were both 20%ers (with Charles going full time for a short period). For example how to handle explicit writes for an in-flight load is not linearized (puts stomp the load whereas removals no-op, iirc). When writing narrow portions from the perspective of a transient best-effort cache the issues seemed benign, but taking a step back it's clearly wrong. The overwhelming amount of code due to forking ConcurrentHashMap makes it hard to wade through. Caffeine benefits from decorating the map, the addition of Java 8's computes, being much more test driven, and learning from these past experiences.
I wrote a simple Lincheck test below that could be extended and drive iterations on fixes for these linearization problems. However as @cgdecker mentioned since this cache is in maintenance mode, if possible then please try Caffeine where I've tried to be more diligent on correctness.
I am using the LoadingCache in a highly concurrent system and I observed some behavior that I am not sure is intended (to me it looks like a bug :) ). I wrote the following sample code to illustrate the problem :
Sometimes thing are fine, I don't get the output of " value is null!!!" lines and some times I do. I never get the output of "valueFromUnchecked from getUnchecked is null!!!"
My understanding of from LoadingCache java doc, is that if
getUnchecked()
does not return null thengetIfPresent()
shouldn't return null as well; given that the cache has not expired yet.