gpu / JOCLBlast

JOCLBlast - Java bindings for CLBlast
Other
14 stars 4 forks source link

NoClassDefFoundError when trying to use CLBlast #3

Open blueberry opened 8 years ago

blueberry commented 8 years ago

I include JOCL-0.2.0-RC1-SNAPSHOT and JOCLBlast-0.0.1-SNAPSHOT in my Cloajure leiningen project (basically the same as maven, an uses maven repositories and poms etc.)

I start the REPL and:

(import 'org.jocl.CL)
;;org.jocl.CL
(CL/setExceptionsEnabled false)
;;nil

So, JOCL loads and works as always.

But, then:

(import 'org.jocl.blast.CLBlast)
;;org.jocl.blast.CLBlast
(CLBlast/setExceptionsEnabled false)
;;NoClassDefFoundError Could not initialize class org.jocl.blast.CLBlast  uncomplicate.neanderthal.examples.guides.tutorial-opencl-test/eval51311 (form-init3160172747902287401.clj:416)

CLBlast itself is available on the system and can be loaded from Java:

(System/loadLibrary "clblast")
;;nil

As JOCL works well, I suspect there is some subtle error with JOCLBlast's library loading in exotic class loader setups (such as in Clojure). This is the complete stack trace:

                      REPL:  432  uncomplicate.neanderthal.examples.guides.tutorial-opencl-test/eval51325
                      REPL:  432  uncomplicate.neanderthal.examples.guides.tutorial-opencl-test/eval51325
             Compiler.java: 6927  clojure.lang.Compiler/eval
             Compiler.java: 6890  clojure.lang.Compiler/eval
                  core.clj: 3105  clojure.core/eval
                  core.clj: 3101  clojure.core/eval
                  main.clj:  240  clojure.main/repl/read-eval-print/fn
                  main.clj:  240  clojure.main/repl/read-eval-print
                  main.clj:  258  clojure.main/repl/fn
                  main.clj:  258  clojure.main/repl
    interruptible_eval.clj:  100  clojure.tools.nrepl.middleware.interruptible-eval/evaluate/fn
                  AFn.java:  152  clojure.lang.AFn/applyToHelper
                  AFn.java:  144  clojure.lang.AFn/applyTo
                  core.clj:  646  clojure.core/apply
                  core.clj: 1881  clojure.core/with-bindings*
                  core.clj: 1881  clojure.core/with-bindings*
               RestFn.java:  425  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   85  clojure.tools.nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:  219  clojure.tools.nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
    interruptible_eval.clj:  190  clojure.tools.nrepl.middleware.interruptible-eval/run-next/fn
                  AFn.java:   22  clojure.lang.AFn/run
   ThreadPoolExecutor.java: 1142  java.util.concurrent.ThreadPoolExecutor/runWorker
   ThreadPoolExecutor.java:  617  java.util.concurrent.ThreadPoolExecutor$Worker/run
               Thread.java:  745  java.lang.Thread/run

The tooling (I use emacs and emacs-cider for Clojure) recognizes CLBlast class and offers code completion, so the class itself is obviously available in the project, but the problem is in the library loading phase.

gpu commented 8 years ago

How much of potential error messages is "hidden" there? I only see

NoClassDefFoundError Could not initialize class org.jocl.blast.CLBlast

as the message, but it does not seem to say which class is not found (and, if it is CLBlast itself, why it could not initialize it).

So first things first: If the initialization throws the ubiquitous UnsatisfiedLinkError, then it would appear there, right?

blueberry commented 8 years ago

The complete stack trace is in my initial message. I can try with some debug printlns in JOCLBlast...

gpu commented 8 years ago

Sorry, I can't figure out what

 uncomplicate.neanderthal.examples.guides.tutorial-opencl-test/eval51325

exactly is (some lambda, obviously...), and what causes the actual error. To understand this better: When you (temporarily) provoke an error, by (temporarily) moving clblast.so to a directory where it can not be found, does this cause the same error?

blueberry commented 8 years ago

I doctored CLBlast class with printlns like this:

static
    {
        System.out.println("============= 1:");
        String libraryBaseName = "JOCLBlast_0_0_1";
        System.out.println("============= 2:");
        String libraryName =
            LibUtils.createPlatformLibraryName(libraryBaseName);
        System.out.println("============= 3:");
        String dependentLibraryNames[] = { "clblast" };
        System.out.println("============= 4:");
        try
        {
            LibUtils.loadLibrary(libraryName, dependentLibraryNames);
        }
        catch (UnsatisfiedLinkError e)
        {
            System.out.println("============= 5:");
            e.printStackTrace();
            throw e;
        }
    }

But nothing gets printed before the exception. It seems to me that java tries to load some class that is needed for CLBlast to even get to CLBlast initialization, but can not figure out which class that would be. As I've mentioned, org.jocl.CL works fine in the same setting, so I would look at potential problems in the differences between these two classes.

blueberry commented 8 years ago

That eval is this: (CLBlast/setExceptionsEnabled false) It is evaluated dynamically in the repl, that's why it gives the line you mentioned.

When I remove libclblast.so, the error is the same. I doubt that the problem is related to libclblast, because (System/loadLibrary "clblast") works fine.

gpu commented 8 years ago

(This aimed at finding out whether there might be a problem with loading the JOCLBlast library because it did not "match" the CLBlast library in some way - but I assume that you compiled both from the current code state, so if there would have been differences, they would have caused linker errors and such).

Regarding the possible reasons for the NoClassDefFoundError ... this is unusual, to say the least. If at all, then this error may be caused by some outdated JAR reference or so. There are only few other classes in the clblast package, and I assume that they can be loaded.

Sorry, the only thing that I could suggest now is some tedious "binary search": Starting with an empty CLBlast class, only containing the setExceptionsEnabled method, which sets a flag or prints some debug output, and see whether it can be loaded.Then adding the static initializer block. Then adding a single method and its native counterpart. (Each step will require more imports - maybe the culprit can be identified then...)

blueberry commented 8 years ago

@gpu Believe or not, I identified the troublemaker: I changed the CLBlast static initializer to this:

static
    {
        String libraryBaseName = "JOCLBlast_0_0_1";
        String libraryName =
            LibUtils.createPlatformLibraryName(libraryBaseName);
        //String dependentLibraryNames[] = { "clblast" };
        try
        {
            LibUtils.loadLibrary(libraryName);//, dependentLibraryNames);
        }
        catch (UnsatisfiedLinkError e)
        {
            throw e;
        }
    }

And now JOCLBlast loads, and does call appropriate methods.

However, when I call CLBlastSscal, i get this OpenCL error through OpenCL's error reporting mechanism:

(with-default
  (with-default-engine
    (with-release [x (sv-cl (range 3))]
      (CLBlast/CLBlastSscal 3 6.0 (uncomplicate.clojurecl.core/cl-mem (.buffer x)) 0 1 *command-queue* nil)
      (asum x)))))
;;CLException INVALID error code: -2048  org.jocl.blast.CLBlast.checkResult (CLBlast.java:96)

It seems that JOCLBlast (or CLBlast) does not like the (valid) command queue that i gave it as an argument, but I will investigate that tomorrow.

So, it seems that explicitly loading clblast causes problems. Why are you calling that explicitly? Shouldn't it be handled by the OS, since I suppose that you dynamically link clblast when you build joclblast through make? In neanderthal-atlas, the lib loader only loads libneanderthal, and atlas is provided automatically I think...

I hope that what I discovered helps you in identifying what could be wrong. If you have any other ideas that I should try, please tell.

blueberry commented 8 years ago

This error code is from CLBlast:

kKernelLaunchError         = -2048, // Problem occurred when enqueuing the kernel
blueberry commented 8 years ago

I am almost sure about the problem cause: I have 3 GPUs (the 1st supports OpenCL 1.2, while the 2nd and the 3rd support OpenCL 2.0). I use the 2nd to create the context and command queue, but CLBlast might use the 1st (OS's default) to build the program. That's why the command queue that I supply is invalid.

Do you think this could be the issue? How to set the device and context that CLBlast uses from JOCLBlast?

gpu commented 8 years ago

The error code is something that I'd first investigate in native CLBlast (maybe I can try to see whether there is a test case for (the C-version of) CLBlastSscal tomorrow or early next week.


Now, that the problem is caused by manually loading the CLBlast library is once again irritating. Although that's what I was aiming at when I said to remove the CLBlast lib temporarily: Shouldn't it throw an UnsatisfiedLinkError when there is a problem with loading the library? I wonder how this can cause a NoClassDefFoundError...

The reason of why I'm loading the library manually is that the goal was to have one deployable JAR, with the native JOCLBlast library and the matching native CLBlas library. One can not say to Java developer: _"Here are the Maven coordinates of JOCLBlast .... but you have to build the CLBLast library on your own and put it into the LD_LIBRARYPATH". It should be possible to just use the JAR. And in order to load the native JOCLBlast library, the native CLBlast library has to be loaded first.

Of course, I tried this locally, on Windows, but obviously, this was once again not sufficient. Did you have the chance to test JOCLBlast on Linux with plain Java? (Maybe using the sample from this forum post: https://forum.byte-welt.net/byte-welt-projekte-projects/jocl/18180-joclblas-java-bindings-clblas-2.html#post131422 ).

(I just wonder whether this is also related to the Clojure classloading black-box, or wehther it's a general issue...)

gpu commented 8 years ago

Sorry, these overlapped:

Do you think this could be the issue? How to set the device and context that CLBlast uses from JOCLBlast?

This can certainly be a reason for a command queue being invalid. I'll have to take a closer look at the inner workings of CLBlast to see how this could be solved, I'll try to do this tomorrow (it's already late here)

blueberry commented 8 years ago

We are in the same GMT+1 time zone I suppose :)

Regarding the libclblas availability: If you would like to include CLBlast in the jar, would'n it be easier to compile JOCLBlast native library statically? I thought that you chose dynamic loading to enable the user to tune CLBlast (because the tuning database it uses only contains a dozen cards, otherwise it uses non-optimal defaults).

blueberry commented 8 years ago

CLBlast reference: https://github.com/CNugteren/CLBlast/issues/43

blueberry commented 8 years ago

After restarting the JVM, CLBlast function calls work well with all queues. The problem was due to some initial fail in interaction with CLBlast's internal resource caching mechanism.

The problem with System.loadLibrary stays, that is, JOCLBLast works on my system with explicit loading disabled.

gpu commented 8 years ago

Statically linking might be an option for this particular setup - or so to say, for each particular setup. I actually wanted to cover the generic case, of one JNI library depending on (possibly multiple, possibly precompiled) other dynamic libraries.

I still wonder whether this is a general problem with the dependency loading mechanism (which was introduced only recently), or whether the "root" cause is the same that caused the problems with Sizeof in https://github.com/gpu/JOCL/issues/5 .

I'll probably have to take a closer look at Clojure and its native library handling. A quick websearch brings some results (one (of many!) on stackoverflow). But also one that might be strongly related to our case: https://groups.google.com/forum/#!topic/clojure-dev/awe7-yeieIM (and the lower part of the thread that is linked from there: https://groups.google.com/forum/#!topic/clojure/br_sTSuWBJ8/discussion ). I'll try to read through this and related pages during the weekend.

blueberry commented 8 years ago

Certainly, a general and flexible solution would be great.

In this particular case, I would be satisfied with at least a temporary solution (for example, statically linked library) that would enable me to release a version of Neanderthal that rely on a JOCLBlast preview release. Nobody expects JOCLBlast to offer a perfect solution in version 0.0.1 or 0.0.2 - it is enough if it is globaly available for windows and linux. A decent or even barely working solution available to a handful of early adopters is much better than a solution waiting for perfection available to almost nobody, IMO.

gpu commented 8 years ago

Sure, I see, such a release may be helpful to gain some early feedback which is largely independent of how the library is loaded internally. Nevertheless, this issue (and the Sizeof one) make me think that there are some hidden caveats related to native libraries in Clojure.

I can try to create a statically linked version in the next days. (And I agree: For the case of JOCLBlast this might even be the long-term solution). But I'd really like to know what went wrong here and in the Sizeof case.

gpu commented 8 years ago

Just a short update: I read the linked posts, but don't think it will be helpful to just read them unless I also try it out. Attempts to set up all this resulted in google journeys, the last one being about

CompilerException java.lang.Exception: No namespace: uncomplicate.neanderthal.impl.buffer-block, compiling:(uncomplicate/neanderthal/impl/cblas.clj:1:1) 

but this classpath- and namespace management of clojure at the moment just seems (horribly) intransparent for me, so this may take some time to get it running.

blueberry commented 8 years ago

What's your setup? If you need help with setting up development environment, I can help.

blueberry commented 8 years ago

And I have just pushed the recent changes of ClojureCL and Neanderthal (JOCL-RC1 dependency) to the master at github. The version that you had was a pre-JOCL-RC1.

gpu commented 8 years ago

I'm at work ATM, and can likely not do (many) further tests before tomorrow evening. But I used the "Counterclockwise" Eclipse Plugin, clojure 1.6.0 (tried 1.8.0 as well), and neanderthal 0.5.0. There had been another issue first: It did not seem to find some sort of "initializer class" of neanderthal. After some refreshing ("...switching it off and on again" ;-)) this worked, but then it complained about the missing namespace. However, I'm pretty sure that this is due to my lack of understanding of Clojure concepts and the setup in general, I'll try again (if possible, tomorrow), using your "hello world" example.

Again, I think that statement from https://groups.google.com/d/msg/clojure/br_sTSuWBJ8/NXppL1EWZhAJ

What is happening is that the vtk java files and your clojure code are using different classloaders (clojure uses its own classloader).

System/loadLibrary is kind of crippled in that it always loads the library into the ClassLoader of the invoking class's classLoader. I was hoping it would use the Thread's context classloader, but it does not.

There isn't any straightforward way to load a library using a particular classloader either, so you have 2 options.

might describe the core issue here (this is somewhere between a "gut feeling" and an "educated guess")

blueberry commented 8 years ago

See my previous comment about Neanderthal version. Also, Neanderthal requires Clojure 1.8.0 (see its leiningen project.clj for other dependencies). Counterclockwise usually work ok-ish if you know what you are doing, but because Eclipse likes to use its own project settings, sometimes you need to synchronize it with leiningen by hand from the right-click menu. Imho, Eclipse is hugely inferior to emacs (cider https://github.com/clojure-emacs/cider and prelude http://batsov.com/prelude/) in that regard for languages such as Clojure (and not many people are using it in the Clojure world). Of course, if you are used to VisualStudio or Eclipse, emacs+cider could be a huge culture shock. Fortunately, for the hello world, command line leiningen might do what you need without much fuss.

In your case, maybe an easier path would be:

  1. Install leiningen.org (version gt 2.6.0)
  2. Clone the latest ClojureCL (which depends on JOCL-RC01-SNAPSHOT that you have) and test it using lein midje (mvn test equivalent)
  3. if it works, do lein install
  4. Clone the latest Neanderthal. Now, to be able to build it, the test phase (midje) would need to find ATLAS BLAS library on your machine. If you do not have that, you can try to just build it without testing: lein install (but then you won't cause those class-loading exceptions)
  5. Then, Try hello world (but update the neanderthal dependency to 0.6.0-SNAPSHOT) to explore class-loading errors
  6. If this is too confusing for you, notify me, and I'll send you uncomplicate parts of my maven repository, so you won't have to do those builds.
blueberry commented 8 years ago

Hah, now I remembered that ClojureCL's tests rely on having OpenCL 2.0 functionality, that you'll need AMD GPU for...

blueberry commented 8 years ago

And another idea, much easier for you, since it does not require my libraries at all:

  1. Create a new clean leiningen-based Clojure project.
  2. In project.clj, add dependencies to JOCL and JOCLBLast
  3. Start repl (in Eclipse or wherever)
  4. Use Clojure's Java interoperability to import and call JOCL methods that you already know well.
blueberry commented 8 years ago

@gpu BTW, A user of Neanderthal, who is very interested in using it for GPU computing on Nvidia and OSX has access to a bunch of Macs, volunteered to compile JOCLBlast (and JOCL) for OSX. I hope this would speed the release cycle now that we can cover all three major OS quickly.

gpu commented 8 years ago

So I had another short look at this, and after reading https://sourceforge.net/p/math-atlas/discussion/1026734/thread/871639fe/ I frankly have to say that ATLAS is obviously not meant to be used by people like me, and I certainly will not waste my time with that.

(At least, I understand the burdens that he mentions in the FAQ entry that he linked. Some pretty bold statements for someone who is publicly funded and not just spending his spare time to make other people's lifes easier, but hey, I won't deny him his right to complain...)

I'll try the second approach that you described, probably tomorrow, if I can schedule it.

blueberry commented 8 years ago

OK. Please ask for any help with the setup, if needed. To reproduce the exceptions, you can use those few lines of code I posted in the first message.

gpu commented 8 years ago

OK, after a rather frustrating journey and a few hours of debugging, I have to consider that I'm just plainly stupid - but I need your help to verify this:

The actual reason for the problems may be that I forgot to mention a crucial step of building JOCLBlast: After compiling the native library, the resulting library libJOCLBlast_0_0_1-linux-x86_64.so will be placed in the nativeLibraries directory. Fine.

Now ... can you add the native libclblast.so into a subdirectory of this nativeLibraries directory, so that the structure afterwards will be as follows: (EDIT: Updated - see the following comments)

JOCLBlast/
    nativeLibraries/ 
        libJOCLBlast_0_0_1-linux-x86_64.so
        linux/
            x86_64/
                libclblast.so

Then, after a mvn clean install, the JAR should contain the JOCLBlast native and the CLBlast native (in the appropriate subdirectory).

(I think this might be the reason for the problems, because the library loader will try to load the dependency from the JAR as well - and until now, it can simply not find it...)

If this does not solve the issue, I'll post the (other) results of my debugging journey here, maybe one of them yields a solution, eventually...

blueberry commented 8 years ago

I can only verify that you're an exceptionally smart guy! :)

I am out of town until Friday, so I'll try this then. It looks to me to be the thing at the first glance.

On the side note, I can verify that I've tried several functions and they work like a charm. I found a few issues in CLBlast and reported them to Cedric. The problems are identified, and they seem to be easily solvable (when he finds time to fix them).

Thank you very much for the great work!

blueberry commented 8 years ago

I found the means to test this and can confirm that you are a genius, sir!

The problem is solved after reverting CLBlast.java to load the library, AND copying libclblast.so to the appropriate directory, just as you described.

blueberry commented 8 years ago

@gpu In addition to your description, I have to add that:

  1. JOCLBlast/nativeLibraries/linux/x86_64/libclblast.so worked
  2. from the formatting in your example it looked like JOCLBlast/linux/x86_64/libclblast.so - that did NOT work

but that was probably unintentional mistake in the formatting. The description was OK.

gpu commented 8 years ago

Of course, you're right, apologies, it should indeed be

JOCLBlast/
    nativeLibraries/ 
        libJOCLBlast_0_0_1-linux-x86_64.so
        linux/
            x86_64/
                libclblast.so

(I had added these subdirectories because there may be different dependencies with the same names - it would, in some sense, be easier to rename the dependencies according to the -version-OS-ARCH pattern (and this would have made this bug less likely), but it's hard to say what would be the "best" solution).

I'll update the README of JOCLBlast and JOCLBLAS accordingly tomorrow (like it has been promised there for quite a while now...), and explain how the dependencies can be handled.

Still, as you suggested, statically linking CLBlast is also a resonable option, but it's good to see that the dependency loading basically works (when done correctly).

I'll leave this open until I have updated the documentation and things have settled and been tested more extensively.

Thanks for your great support!

gpu commented 8 years ago

I have added a section in the readme of https://github.com/gpu/JOCL for "Building and packaging the external native library dependencies" that explains the placement of the dependent libraries.

I don't like the manual step that is involved there. I think it should be possible to apply some CMake magic to automatically copy the dependency (e.g. clblast.dll) into the appropriate target subdirectory of the nativeLibraries folder - this would avoid some hassle and potential bugs.

I'll leave this issue open for now, and consider such a solution and other alternatives. Although it's not the most pressing task, it's certainly related to others (simplifying the build, the maven deployment and packaging, and the versioning of JOCL itself, but mainly of JOCLBlast referring to CLBlast....