jblas-project / jblas

Linear Algebra for Java
http://jblas.org
BSD 3-Clause "New" or "Revised" License
590 stars 149 forks source link

Fix incorrect use of JNIEnv. It is not supposed to be cached. #84

Closed azenkov closed 7 years ago

azenkov commented 7 years ago

Hello,

I came across this problem when we started to use jblas in our distributed computation clusted in a multithreaded environment. JVM started to crash constantly giving us some weird crash dumps jblas was mentioned there very often but crash dumps were all over the place and we couldn't pin point the problem. After looking at the JNI wrapper I noticed that xerbla() callback uses cached JNIEnv* which is not recommened. I made a multithread test for xerbla which was just calling SanityChecks.checkXerbla() in multiple threads simultaneously. When I set a good mount of threads ~ 100 the test made jvm to crash every time. Here is proposed fix for this. After this change my test passes (no crashes) with any amount of concurrent threads for extended periods of time. I didn't include this test in PR because I was not sure it will be desirable to have long running unit tests. Please consider including this fix into the main branch. Jblas is a very useful library it would be great to make more reliable.

Thank you.

mikiobraun commented 7 years ago

Hi azenkov,

aaah, I see what I did there. Yes, that obviously wouldn't work. A bit swamped with work right now, but I'll fully review & merge the PR in the next few days.

Thanks for your efforts!

-M

beyondpie commented 7 years ago

Hi, mikiobraun, Thanks a lot for Jblas. And does it merge in the new one? Since I might run Jblas in the same situation. Songpeng

mikiobraun commented 7 years ago

Hello azenkov,

finally found the time to look into this. Thanks a lot for your contribution! I will merge it, then do a little bit of cosmetic clean-up.

Reading this: https://stackoverflow.com/questions/9642506/jni-attach-detach-thread-memory-management it seems that one should not call AttachCurrentThread all the time as it potentially create a new instance, as well as Detaching afterwards. I will fix both these things afterwards.

Best, -M