Closed jcuda closed 8 years ago
@btbouwens Since I cannot reproduce the issue: If you add a blatant statement like
private static boolean testMethod(Method method)
{
if (method.toString().contains("cuPointerSetAttribute")) return true; // XXX
...
in the BasicBindingTest, does it pass? Or are other methods affected as well?
(Of course, the cuPointerSetAttribute
mentions some constraints for the valid parameters, but IMHO it should not crash, even when the arguments are invalid, so I'm not sure what the most reasonable (practical) workaround could be in this case...)
Then I get a similar crash with
# Problematic frame:
# J 312 C1 jcuda.driver.JCudaDriver.cuCtxGetSharedMemConfig([I)I (8 bytes) @ 0x00007f57bd1c7b20 [0x00007f57bd1c7b20+0x0]
I can provide the full error log if that is helpful.
OK, I found an embarassing bug here: Two of the newer functions did not call the native function at all. This is fixed now, and the basic binding test has been extended to cover this case as well.
The observed behavior is still a bit odd, since the error log looks like (and I would even say indicates that) it crashed on native side.
So at least, a severe bug has been fixed now, but I'm not entirely sure whether your issue is resolved as well - it would be great if you could give it a try.
Well, there is still a crash, but it is different now. See the error log: hs_err_pid20717.log.txt
Incidently there's been a kernel update too, but I think that's not so relevant.
OK, the cudaGLUnregisterBufferObject
method is, in fact, deprecated since CUDA 3.0, so it's not sooo surprising when it causes "unexpected" behavior.
I know, it's a bit odd, but since I can not reproduce the issue: Does it still crash when you omit this method, with
private static boolean testMethod(Method method)
{
if (method.toString().contains("cudaGLUnregisterBufferObject")) return true; // XXX
...
In general, I'm considering two things:
@deprecated
statusIn any case, thanks for your support!
When I do that, I get a similar crash, but instead of
Stack: [0x00007f7c4b69e000,0x00007f7c4b79f000], sp=0x00007f7c4b79b948, free space=1014k
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j jcuda.runtime.JCuda.cudaGLUnregisterBufferObjectNative(I)I+0
j jcuda.runtime.JCuda.cudaGLUnregisterBufferObject(I)I+1
v ~StubRoutines::call_stub
j sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
j sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100
j sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6
j java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
j jcuda.test.BasicBindingTest.testMethod(Ljava/lang/reflect/Method;)Z+62
j jcuda.test.BasicBindingTest.testBinding(Ljava/lang/Class;)Z+66
j jcuda.test.JCudaBasicBindingTest.testJCuda()V+2
v ~StubRoutines::call_stub
there is now
Stack: [0x00007f9c204f6000,0x00007f9c205f7000], sp=0x00007f9c205f3a08, free space=1014k
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j jcuda.driver.JCudaDriver.cuGLInitNative()I+0
j jcuda.driver.JCudaDriver.cuGLInit()I+0
v ~StubRoutines::call_stub
j sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
j sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100
j sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6
J 307 C1 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (62 bytes) @ 0x00007f9c0cd99dd4 [0x00007f9c0cd999e0+0x3f4]
j jcuda.test.BasicBindingTest.testMethod(Ljava/lang/reflect/Method;)Z+76
j jcuda.test.BasicBindingTest.testBinding(Ljava/lang/Class;)Z+66
j jcuda.test.JCudaBasicBindingTest.testJCudaDriver()V+2
v ~StubRoutines::call_stub
So then I assume that all the deprecated GL functions in the driver API on Linux will cause such a crash. You might want to add them successively ...
if (method.toString().contains("cudaGLUnregisterBufferObject")) return true; // XXX
if (method.toString().contains("cuGLInit")) return true; // XXX
...
otherwise, I'll try to update the test later today so that is allows skipping deprecated functions (not sure how I'll do this yet, but there are several options....)
So I added proper @Deprecated
annotations to the method that are deprecated in CUDA (formerly, they only had a @deprecated
comment in the JavaDocs). The BasicBindingTest
skips all @Deprecated
methods, as they are not guaranteed to work any more on CUDA side.
With the latest fix the build finished successfully.
Still I find it slightly odd to have .so-files with names not starting with lib
:
total 4424
-rw-rw-r-- 1 29111 Dec 7 19:41 jcublas-0.7.5.jar
-rw-rw-r-- 1 860632 Dec 7 19:41 JCublas2-linux-x86_64.so
-rw-rw-r-- 1 193736 Dec 7 19:41 JCublas-linux-x86_64.so
-rw-rw-r-- 1 134289 Dec 7 19:41 jcuda-0.7.5.jar
-rw-rw-r-- 1 247880 Dec 7 19:41 JCudaDriver-linux-x86_64.so
-rw-rw-r-- 1 712320 Dec 7 19:41 JCudaRuntime-linux-x86_64.so
-rw-rw-r-- 1 11427 Dec 7 19:41 jcufft-0.7.5.jar
-rw-rw-r-- 1 79296 Dec 7 19:41 JCufft-linux-x86_64.so
-rw-rw-r-- 1 8670 Dec 7 19:41 jcurand-0.7.5.jar
-rw-rw-r-- 1 75192 Dec 7 19:41 JCurand-linux-x86_64.so
-rw-rw-r-- 1 19832 Dec 7 19:41 jcusolver-0.7.5.jar
-rw-rw-r-- 1 818848 Dec 7 19:41 JCusolver-linux-x86_64.so
-rw-rw-r-- 1 36739 Dec 7 19:41 jcusparse-0.7.5.jar
-rw-rw-r-- 1 1239248 Dec 7 19:41 JCusparse-linux-x86_64.so
The native libraries are built in the nativeLibrary
subdirectory of each project, and, after #2 has been fixed, also with the proper lib...
prefix on linux.
However, the main POM contains a call to the dependency plugin that is invoked in the "package" phase to copy all the native libraries into the build output of the jcuda-main
project (to conveniently have them in one place) - and there, the prefixes had still been missing. This should be fixed now as of 005dd0c319fac64b7fe5d2a49238f3018aa062a7
I'll close this one. If you encounter further bugs with the BasicBindingTest
, this may be reopened, and for other bugs (related to the build process), a new issue may be created.
Thanks again for your support. It's good to know that the builds now work (more) smoothly on Linux as well.
Solution looks fine to me. Thanks.
This was reported as a follow-up to #2 (See the log file at https://github.com/jcuda/jcuda-main/issues/2#issuecomment-161804043 for details)
The BasicBindingTest crashes the JVM on some environments. Particularly, the crash happens in
cuPointerSetAttribute
.The BasicBindingTest (as the name suggests) blindly calls all methods with arbitrary parameters - and maybe even invalid ones - just to see whether the methods are correctly wired in the JNI layer. This should be OK, because all errors are cought and ignored (and this should be "handled"). But if one of the underlying CUDA functions causes a crash when it receives such invalid parameters, then it may also crash the JVM, and the trace that was posted suggests that this might be the case here.
There are some possible options for investigating this further. First of all, it would be interesting to see whether this only affects
cuPointerSetAttribute
or other methods as well. The conditions of the crash have to be investigated (is it caused by the parameters? or the device capabilities of the system that the method is executed on?).In any case, it will likely be necessary to either omit the test for this particular method, or introduce a special treatment.