jcuda / jcuda-main

Summarizes the main JCuda libraries
MIT License
99 stars 20 forks source link

BasicBindingTest causes crash in cuPointerSetAttribute #3

Closed jcuda closed 8 years ago

jcuda commented 8 years ago

This was reported as a follow-up to #2 (See the log file at https://github.com/jcuda/jcuda-main/issues/2#issuecomment-161804043 for details)

The BasicBindingTest crashes the JVM on some environments. Particularly, the crash happens in cuPointerSetAttribute.

The BasicBindingTest (as the name suggests) blindly calls all methods with arbitrary parameters - and maybe even invalid ones - just to see whether the methods are correctly wired in the JNI layer. This should be OK, because all errors are cought and ignored (and this should be "handled"). But if one of the underlying CUDA functions causes a crash when it receives such invalid parameters, then it may also crash the JVM, and the trace that was posted suggests that this might be the case here.

There are some possible options for investigating this further. First of all, it would be interesting to see whether this only affects cuPointerSetAttribute or other methods as well. The conditions of the crash have to be investigated (is it caused by the parameters? or the device capabilities of the system that the method is executed on?).

In any case, it will likely be necessary to either omit the test for this particular method, or introduce a special treatment.

jcuda commented 8 years ago

@btbouwens Since I cannot reproduce the issue: If you add a blatant statement like

private static boolean testMethod(Method method)
{
    if (method.toString().contains("cuPointerSetAttribute")) return true; // XXX
    ...

in the BasicBindingTest, does it pass? Or are other methods affected as well?

(Of course, the cuPointerSetAttribute mentions some constraints for the valid parameters, but IMHO it should not crash, even when the arguments are invalid, so I'm not sure what the most reasonable (practical) workaround could be in this case...)

btbouwens commented 8 years ago

Then I get a similar crash with

# Problematic frame:
# J 312 C1 jcuda.driver.JCudaDriver.cuCtxGetSharedMemConfig([I)I (8 bytes) @ 0x00007f57bd1c7b20 [0x00007f57bd1c7b20+0x0]

I can provide the full error log if that is helpful.

jcuda commented 8 years ago

OK, I found an embarassing bug here: Two of the newer functions did not call the native function at all. This is fixed now, and the basic binding test has been extended to cover this case as well.

The observed behavior is still a bit odd, since the error log looks like (and I would even say indicates that) it crashed on native side.

So at least, a severe bug has been fixed now, but I'm not entirely sure whether your issue is resolved as well - it would be great if you could give it a try.

btbouwens commented 8 years ago

Well, there is still a crash, but it is different now. See the error log: hs_err_pid20717.log.txt

Incidently there's been a kernel update too, but I think that's not so relevant.

jcuda commented 8 years ago

OK, the cudaGLUnregisterBufferObject method is, in fact, deprecated since CUDA 3.0, so it's not sooo surprising when it causes "unexpected" behavior.

I know, it's a bit odd, but since I can not reproduce the issue: Does it still crash when you omit this method, with

private static boolean testMethod(Method method)
{
    if (method.toString().contains("cudaGLUnregisterBufferObject")) return true; // XXX
    ...

In general, I'm considering two things:

In any case, thanks for your support!

btbouwens commented 8 years ago

When I do that, I get a similar crash, but instead of

Stack: [0x00007f7c4b69e000,0x00007f7c4b79f000],  sp=0x00007f7c4b79b948,  free space=1014k
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  jcuda.runtime.JCuda.cudaGLUnregisterBufferObjectNative(I)I+0
j  jcuda.runtime.JCuda.cudaGLUnregisterBufferObject(I)I+1
v  ~StubRoutines::call_stub
j  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
j  sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100
j  sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6
j  java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+56
j  jcuda.test.BasicBindingTest.testMethod(Ljava/lang/reflect/Method;)Z+62
j  jcuda.test.BasicBindingTest.testBinding(Ljava/lang/Class;)Z+66
j  jcuda.test.JCudaBasicBindingTest.testJCuda()V+2
v  ~StubRoutines::call_stub

there is now

Stack: [0x00007f9c204f6000,0x00007f9c205f7000],  sp=0x00007f9c205f3a08,  free space=1014k
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  jcuda.driver.JCudaDriver.cuGLInitNative()I+0
j  jcuda.driver.JCudaDriver.cuGLInit()I+0
v  ~StubRoutines::call_stub
j  sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+0
j  sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+100
j  sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;+6
J 307 C1 java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object; (62 bytes) @ 0x00007f9c0cd99dd4 [0x00007f9c0cd999e0+0x3f4]
j  jcuda.test.BasicBindingTest.testMethod(Ljava/lang/reflect/Method;)Z+76
j  jcuda.test.BasicBindingTest.testBinding(Ljava/lang/Class;)Z+66
j  jcuda.test.JCudaBasicBindingTest.testJCudaDriver()V+2
v  ~StubRoutines::call_stub
jcuda commented 8 years ago

So then I assume that all the deprecated GL functions in the driver API on Linux will cause such a crash. You might want to add them successively ...

if (method.toString().contains("cudaGLUnregisterBufferObject")) return true; // XXX
if (method.toString().contains("cuGLInit")) return true; // XXX
...

otherwise, I'll try to update the test later today so that is allows skipping deprecated functions (not sure how I'll do this yet, but there are several options....)

jcuda commented 8 years ago

So I added proper @Deprecated annotations to the method that are deprecated in CUDA (formerly, they only had a @deprecated comment in the JavaDocs). The BasicBindingTest skips all @Deprecated methods, as they are not guaranteed to work any more on CUDA side.

btbouwens commented 8 years ago

With the latest fix the build finished successfully. Still I find it slightly odd to have .so-files with names not starting with lib:

total 4424
-rw-rw-r-- 1   29111 Dec  7 19:41 jcublas-0.7.5.jar
-rw-rw-r-- 1  860632 Dec  7 19:41 JCublas2-linux-x86_64.so
-rw-rw-r-- 1  193736 Dec  7 19:41 JCublas-linux-x86_64.so
-rw-rw-r-- 1  134289 Dec  7 19:41 jcuda-0.7.5.jar
-rw-rw-r-- 1  247880 Dec  7 19:41 JCudaDriver-linux-x86_64.so
-rw-rw-r-- 1  712320 Dec  7 19:41 JCudaRuntime-linux-x86_64.so
-rw-rw-r-- 1   11427 Dec  7 19:41 jcufft-0.7.5.jar
-rw-rw-r-- 1   79296 Dec  7 19:41 JCufft-linux-x86_64.so
-rw-rw-r-- 1    8670 Dec  7 19:41 jcurand-0.7.5.jar
-rw-rw-r-- 1   75192 Dec  7 19:41 JCurand-linux-x86_64.so
-rw-rw-r-- 1   19832 Dec  7 19:41 jcusolver-0.7.5.jar
-rw-rw-r-- 1  818848 Dec  7 19:41 JCusolver-linux-x86_64.so
-rw-rw-r-- 1   36739 Dec  7 19:41 jcusparse-0.7.5.jar
-rw-rw-r-- 1 1239248 Dec  7 19:41 JCusparse-linux-x86_64.so
jcuda commented 8 years ago

The native libraries are built in the nativeLibrary subdirectory of each project, and, after #2 has been fixed, also with the proper lib... prefix on linux.

However, the main POM contains a call to the dependency plugin that is invoked in the "package" phase to copy all the native libraries into the build output of the jcuda-main project (to conveniently have them in one place) - and there, the prefixes had still been missing. This should be fixed now as of 005dd0c319fac64b7fe5d2a49238f3018aa062a7

I'll close this one. If you encounter further bugs with the BasicBindingTest, this may be reopened, and for other bugs (related to the build process), a new issue may be created.

Thanks again for your support. It's good to know that the builds now work (more) smoothly on Linux as well.

btbouwens commented 8 years ago

Solution looks fine to me. Thanks.