uncomplicate / neanderthal

Fast Clojure Matrix Library
http://neanderthal.uncomplicate.org
Eclipse Public License 1.0
1.06k stars 56 forks source link

Not able to run native tests of neanderthal sucessfull #127

Closed behrica closed 2 years ago

behrica commented 2 years ago

While we discussed https://github.com/uncomplicate/deep-diamond/issues/15 the issue that Neanderthal does not find any more the libmkl_rt.so (even when globaly installed) came up as an other issue.

I prepared a Dockerfile which exposed the issue, maybe useful.

# failing with
# Execution error (UnsatisfiedLinkError) at java.lang.ClassLoader$NativeLibrary/load0 (ClassLoader.java:-2).
#/tmp/libneanderthal-mkl-0.33.07653633467081296505.so: libmkl_rt.so: cannot open shared object file: No such file or directory

FROM clojure:lein-2.9.8-focal
RUN apt-get update && apt-get -y install git wget python3
RUN wget https://registrationcenter-download.intel.com/akdlm/irc_nas/18721/l_onemkl_p_2022.1.0.223.sh
RUN sh ./l_onemkl_p_2022.1.0.223.sh -a --silent  --eula accept

RUN git clone https://github.com/uncomplicate/neanderthal.git

WORKDIR /tmp/neanderthal
RUN git checkout e01511ff47605f2e4031d58899b303e4435d58e3
RUN lein test uncomplicate.neanderthal.mkl-test
behrica commented 2 years ago

Thanks , ClojureCUDA docu says currently this, which seems to say "any CUDA 11.x"

Minimum requirements
Java 8
CUDA Toolkit 11.0 (prefer 11.4)
Linux or Windows. macOS doesn’t allow CUDA from version 11 and up. You can only use an old release of ClojureCUDA on macOS.

I hope I help with this comments, if not let me know...

blueberry commented 2 years ago

I updated the docs of ClojureCUDA to clarify this.

You might use any CUDA version with ClojureCUDA. However, if the CUDA version on your system does not match the one that ClojureCUDA depends on in project.clj, you have to specify explicit dependency to the matching JCuda version in YOUR project.clj

It's similar for Neanderthal, but it might be that a very outdated CUDA does not support all features that I use, and break at will.

Ditto for DD, but I don't expect old CUDA versions so work successfully.

jsa-aerial commented 2 years ago

Agree, but I would hope that over time all vendors will produce "one driver", which work for all their GPUs.

I would say there is zero chance of this happening. Not a small chance, but no chance. There are too many legitimate reasons for them to not do this.

blueberry commented 2 years ago

Thanks , ClojureCUDA docu says currently this, which seems to say "any CUDA 11.x"

Minimum requirements
Java 8
CUDA Toolkit 11.0 (prefer 11.4)
Linux or Windows. macOS doesn’t allow CUDA from version 11 and up. You can only use an old release of ClojureCUDA on macOS.

I hope I help with this comments, if not let me know...

And it DOES (give or take a detail or two) *but you have to state that explicit version, and versions in your project.clj has to match the version installed on your machine. If you specify 11.4 in project.clj, while you install whatever CUDA is shipped with Arch (11.7 currently I believe) it will not work.

Which brings us to one detail: If you do this today, the default JCuda version that Neanderthal/ClojureCUDA uses is 11.6. You have to have that on your OS. CUDA 11.7 is not supported yet (although it might now be what arch installs by default).

blueberry commented 2 years ago

As far as I can see, your system has CUDA 11.6.1, which is exactly what is expected, so you should not change any default. 11.4.1 generally shouldn't work on your machine (or if it does, it's more luck than anything else).

behrica commented 2 years ago

yeah, one reason for me to insist in Docker is "multiple computers". I use GPU on Azure Cloud VMS, and I would like to avoid to "configure each of them individually". They are kind of "temporary resources". But I understood know better the "precise matching requirement", thanks for clarification.

blueberry commented 2 years ago

/tmp/libJCudaDriver-11.6.1-linux-x86_64.so: /usr/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/libJCudaDr iver-11.6.1-linux-x86_64.so)



Explicite downgrading to "[org.jcuda/jcuda "11.4.1"]" solved it.

This is a known gotcha in JCuda. Your system has an old GLIBC. Not your Arch Linux, that one is up-to-date, but your Docker-provided system, which is, if I remember correctly, an Ubuntu one, v 20-or-so. That one ships with a bit older GLIBC. The trouble with GLIBC is that it's version is so fundamentally hard-coded in your environment that it's very difficult to use another one, you have to use the one provided by your system. And your system provides an old one, which breaks JCuda, which was compiled with a recent one.

Your Arch Linux should work, am I correct?

Native dependencies are tricky ;)

behrica commented 2 years ago

yes, It is about that. Took me a while to figure it out,

blueberry commented 2 years ago

Fortunately, you can help solve it. That would require that you build JCuda on your (older) system, and these binaries will then work on newer systems too!

Please check out this issue: https://github.com/jcuda/jcuda-main/issues/51

behrica commented 2 years ago

I created PR #128 with a minimal example Docker setup. So for me we can close this issue here.