Open lananda opened 2 years ago
I have updated the bug with stack of other deadlocked thread
Here (getCachedValidationProviders) the synchronization is on class monitor - this is not going to help this situation, the HashMap itself is not synchronized & that is leading to the issue.
Those two methods are not synchronizing on the class, but on the GetValidationProviderListAction
instance (they're instance methods). What you synchronize on should not matter, as long as all threads accessing the same resource use the same lock, and that's the case for getCachedValidationProviders
and cacheValidationProviders
.
That being said, I see there's a static clearCache
method which bypasses the instance synchronization and indeed only synchronizes on the class. Is it possible that your problem is with that method?
this has been reviewed by JDK team too. they suggest the Validation.java WeakHashMap not being synchronized is the root cause.
I'm not disputing the fact that WeakHashMap
is not thread-safe, which is what your leaks seem to point at. WeakHashMap
is not thread-safe, that's a fact.
What I'm arguing is that we already synchronize access to that map, so adding Collections.synchronizedMap
should not be necessary.
But as I said above, the clearCache
method seems problematic, so if you can confirm that method caused your problem, fixing synchronization in that method should fix your problem as well.
Apart from @yrodiere 's comment, I would repeat what I initially said: these particular methods should generally be called once in the lifetime of an application.
I wonder if you are not creating EntityManagerFactory
and ValidatorFactory
far more often than you should be. I already said that several times but you're ignoring this point and I think it's important.
Not saying that we shouldn't fix a synchronization issue if we end up finding one - for now, it's not clear if there is one. But my point is that even if we had one, you shouldn't experience it, except if you're doing very weird things that I wouldn't recommend if you want your application to perform well.
Apart from @yrodiere 's comment, I would repeat what I initially said: these particular methods should generally be called once in the lifetime of an application. I wonder if you are not creating
EntityManagerFactory
andValidatorFactory
far more often than you should be. I already said that several times but you're ignoring this point and I think it's important.Not saying that we shouldn't fix a synchronization issue if we end up finding one - for now, it's not clear if there is one. But my point is that even if we had one, you shouldn't experience it, except if you're doing very weird things that I wouldn't recommend if you want your application to perform well.
I have requested client to confirm on this API usage. I will check that.
That being said, I see there's a static
clearCache
method which bypasses the instance synchronization and indeed only synchronizes on the class.
I opened #182 to fix that. But again, I'm not sure that's your problem, since your thread dump in #180 was incomplete :)
Here (getCachedValidationProviders) the synchronization is on class monitor - this is not going to help this situation, the HashMap itself is not synchronized & that is leading to the issue.
Those two methods are not synchronizing on the class, but on the
GetValidationProviderListAction
instance (they're instance methods). What you synchronize on should not matter, as long as all threads accessing the same resource use the same lock, and that's the case forgetCachedValidationProviders
andcacheValidationProviders
.That being said, I see there's a static
clearCache
method which bypasses the instance synchronization and indeed only synchronizes on the class. Is it possible that your problem is with that method?this has been reviewed by JDK team too. they suggest the Validation.java WeakHashMap not being synchronized is the root cause.
I'm not disputing the fact that
WeakHashMap
is not thread-safe, which is what your leaks seem to point at.WeakHashMap
is not thread-safe, that's a fact.What I'm arguing is that we already synchronize access to that map, so adding
Collections.synchronizedMap
should not be necessary.But as I said above, the
clearCache
method seems problematic, so if you can confirm that method caused your problem, fixing synchronization in that method should fix your problem as well.
Thanks, I have updated the stack, the issue is not happening during clearCache.
FullThreadDump_Validtion_Hashmap.txt
gsmet yrodiere I have now attached the full thread dump for this issue. could you please review once. I think the fix in clearCache is unrelated to the current issue, as this clearcache invocation is nowhere in the call stack & this issue is happening during initialization. i have checked the code EntityManagerFactory and ValidatorFactory are called only once in application lifecycle
Thanks.
To clarify: the only reason I'm reluctant to merge this patch is that it doesn't seem to fix anything. I'd like to understand how the patch can fix anything, and ideally also the problem itself.
As you can clearly see from your own thread dump, access to the hash map is already protected by a lock:
at java.util.WeakHashMap.get(WeakHashMap.java:403)
at javax.validation.Validation$GetValidationProviderListAction.getCachedValidationProviders(Validation.java:368)
- locked <0x00000005f44e3e10> (a javax.validation.Validation$GetValidationProviderListAction)
And after #182, that lock is being used by every method that could possibly access that map. So wrapping the map with Collections.synchronizedMap
is unlikely to change anything to the situation; you'll just have two locks instead of one.
I think the fix in clearCache is unrelated to the current issue, as this clearcache invocation is nowhere in the call stack
As far as I understand, clearCache
doesn't need to be visible in your thread dump; it's enough that someone called clearCache
concurrently to your stuck thread at some point. I.e. the thread that called clearCache
might have moved on, it's not the one being stuck.
i have checked the code EntityManagerFactory and ValidatorFactory are called only once in application lifecycle
That's good.
Did you also check that clearCache
is never called until shutdown?
Be aware that this code will be invoked reflectively, see https://github.com/jakartaee/validation/blob/1aac3390f131665dd46fadc4a0b13849d0fa0afe/src/main/java/jakarta/validation/Validation.java#L157-L167
Also, considering the history behind clearCache
, it's entirely possible that WebLogic clears the cache by directly retrieving the map through reflection, thereby bypassing the locks. If that's the case, I would suggest reporting a bug to WebLogic. If they go that far, they might as well call clearDefaultValidationProviderResolverCache
reflectively, like other application servers such as WildFly do.
Finally... did you try to run your application with an updated bean-validation JAR that includes my patch (#182), to see if the problem still occurs?
this is a known issue about WeakHashMap, Here are few issues where WeakHashMap is not synchronized is reported https://github.com/eclipse-ee4j/metro-jax-ws/issues/61 https://github.com/spring-projects/spring-loaded/issues/194 https://www.adam-bien.com/roller/abien/entry/endless_loops_in_unsychronized_weakhashmap
Here is the problem as it showed up in Nashorn. https://bugs.openjdk.java.net/browse/JDK-8075006 fixed Nashorn. http://hg.openjdk.java.net/jdk9/jdk9/nashorn/rev/78f82d897305
The Java SE API publicizes that WeakHashMap is not synchronized. In this case, it is being called by javax.validation.Validation$GetValidationProviderListAction.
Hence the fix is suggested at Validation. This been suggested after review from JDK developers.
Weblogic is not calling clearCache using reflection, We are unable to reproduce the issue at will, hence unable confirm the fix of #182
this is a known issue about WeakHashMap
Yes, I've already followed your links and I understand the problem. The question is whether your patch would solve it, considering we already have an equivalent solution (synchronized
keywords on methods accessing the map) in place.
fixed Nashorn. http://hg.openjdk.java.net/jdk9/jdk9/nashorn/rev/78f82d897305
Great minds think alike, then, because they just synchronized all methods accessing the WeakHashMap
. Which is what Bean Validation does, too, in javax.validation.Validation$GetValidationProviderListAction
.
The Java SE API publicizes that WeakHashMap is not synchronized.
Indeed. We do synchronize our own calls to WeakHashMap
, though.
In this case, it is being called by javax.validation.Validation$GetValidationProviderListAction.
... whose methods accessing the WeakHashMap
are all synchronized.
Weblogic is not calling clearCache using reflection
Calling clearCache
would be fine, at least after my patch. I was saying that if WebLogic clears the cache by directly retrieving the map through reflection, then that is a problem, and WebLogic shouldn't do that. I can't check the source code of WebLogic, though, so I don't know what's going on.
We are unable to reproduce the issue at will, hence unable confirm the fix of https://github.com/jakartaee/validation/pull/182
Ok, so as I understand it, you can't confirm my patch solves your problem. I infer you cannot confirm that your patch solves the problem either.
That would be fine if I could understand how on earth your patch could solve the problem, but I don't. Let's hope someone else does and will be able to merge your patch.
Here (getCachedValidationProviders) the synchronization is on class monitor - this is not going to help this situation, the HashMap itself is not synchronized & that is leading to the issue. this has been reviewed by JDK team too. they suggest the Validation.java WeakHashMap not being synchronized is the root cause.