uncomplicate / neanderthal

Fast Clojure Matrix Library
http://neanderthal.uncomplicate.org
Eclipse Public License 1.0
1.06k stars 56 forks source link

Possibly use maven to resolve native deps #97

Closed joinr closed 4 years ago

joinr commented 4 years ago

I recently noticed that the smile library started bundling MKL bindings and claimed to have native deps resolved automatically. It looks like they are using artifacts from bytedeco mkl, which bundles the native dependencies and makes them available on maven. So, in my naive theory, adding

[org.bytedeco/mkl-platform "2020.1-1.5.3"] [org.bytedeco/mkl-platform-redist "2020.1-1.5.3"]

to my dependencies would (and does) pull in the MKL native libs (the size looks right for the redist stuff in comparison to my local MKL installation). If I drop the existing MKL off my path, then add these deps, I would hope that lein/maven would magically populate them for me (as they do with JOGL and other libs), and then Neanderthal could pick them up without having a system-wide MKL installation.

If I do so, I get the familiar error

java.lang.NoClassDefFoundError: Could not initialize class uncomplicate.neanderthal.internal.host.MKL at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at clojure.lang.RT.classForName(RT.java:2211) at clojure.lang.RT.classForName(RT.java:2220) at clojure.lang.Compiler.maybeResolveIn(Compiler.java:7438) at clojure.core$ns_resolve.invokeStatic(core.clj:4370) at clojure.core$ns_resolve.invokeStatic(core.clj:4359) at clojure.core$ns_resolve.invoke(core.clj:4359)

when requiring uncomplicate.neanderthal.native, which was indicative of not having all the expected MKL binaries on the path.

Am I missing a step? The other possibility is that the native deps don't include the exact binaries neanderthal expects (I have a list of the minimal ones necessary); I have not verified the binaries are consistent yet.

On windows 10, under the \org\bytedeco\mkl\2020.1-1.5.3\mkl-2020.1-1.5.3-windows-x86_64-redist.jar, we get:

libiomp5md.dll mkl_avx2.dll mkl_avx512.dll mkl_avx.dll mkl_core.dll mkl_def.dll mkl_intel_thread.dll mkl_mc3.dll mkl_mc.dll mkl_rt.dll mkl_vml_avx2.dll mkl_vml_avx512.dll mkl_vml_avx.dll mkl_vml_cmpt.dll mkl_vml_def.dll mkl_vml_mc2.dll mkl_vml_mc3.dll mkl_vml_mc.dll

mkl_scalapack_lp64.dll is not in there (I seemed to think that mattered the last time I tried this).

blueberry commented 4 years ago

When started, Neanderthal would try to load mkl_rt.dll from the path, which then takes care of loading other libs. Maybe the issue is that you don't call bytedeco, so its loader does not have a chance to load the libraries. Try loading bytedeco first (I haven't try).

joinr commented 4 years ago

I have gotten it working, although trying to iron out some apparent sensitivities to importing org.bytedeco.mkl.global.mkl_rt prior to requiring uncomplicate.neanderthal.native. From the REPL, in user ns, if I manually import, then require, things appear to work out. If I use the ns declaration, it looks like the loading mechanisms are not happening in the same order and neanderthal can't find the dll's when it needs them. Still, it appears to be feasible.

joinr commented 4 years ago

Very interesting. If I'm working at the repl, and copy-pasting in import and require statements (a little bit of delay, like say spread out over 5s), I can load things no problem. If the system is automated, e.g. in a tests file, there appears to be a race condition. I do now know of a way to verify that the native stuff is loaded, then proceed. Currently putting in a dumb timeout but that is duct tape...

joinr commented 4 years ago

getting

C:\Users\joinr\AppData\Local\Temp\neanderthal-mkl-0.25.04108726012663603034.dll: Can't find dependent libraries

In the failure cases. Now it's acting stochastic at the REPL.

joinr commented 4 years ago

neandertest demo repository

joinr commented 4 years ago

similar problem on linux:

Execution error (UnsatisfiedLinkError) at java.lang.ClassLoader$NativeLibrary/load (ClassLoader.java:-2).
/tmp/libneanderthal-mkl-0.25.07678040401293332646.so: libmkl_rt.so: cannot open shared object file: No such file or directory
joinr commented 4 years ago

Lol, I waited like 20 minutes and tried again and it worked. wtf

joinr commented 4 years ago

Maybe there's something going at the OS level with temp file permissions, since a dll now exists in the temp location. Could be a false positive for virus activity or something, dunno...

joinr commented 4 years ago

Adding

(import '[org.bytedeco.javacpp Loader]) (Loader/load org.bytedeco.mkl.global.mkl_rt)

per debugging unsatisfied link error works on windows and allows lein test to pass, where before it never would.

joinr commented 4 years ago

Linux also now passes, where before it didn't.

joinr commented 4 years ago

So the require mechanisms appear to work now, if you refactor out the load jank into a separate namespace, like neandertest.mkl with

(ns neandertest.mkl
  (:import org.bytedeco.mkl.global.mkl_rt
           [org.bytedeco.javacpp Loader]))
(Loader/load org.bytedeco.mkl.global.mkl_rt)

and have that as a require preceding uncomplicate.neanderthal.native. This is still somewhat mysterious to me (aside from somehow apparently reconciling load issues by using their loader to ensure the class is loaded, and thus native deps live somewhere maybe (there appears to be a cache they store as well)). At least there is a proof of principle that this is a viable path for managing MKL native dependencies portably.

joinr commented 4 years ago

Closing issue. Maybe others can follow these tracks if they decide to go down this route instead of relying on a system wide install (and in my case, having to jump through some registration hoops to get the binaries I needed).

blueberry commented 4 years ago

Thank you for exploring this Tom. I've updated Neanderthal to do this discovery, so the user does not need to do anything other than provide javacpp's mkl redist jar on the classpath, which will be auto-discovered. If the jar is not there, the system-wide MKL from the LD_LIBRARY_PATH will be used. Please test 0.35.0 if you have time.

BTW, you didn't need to register at Intel to acquire MKL. You could literally unpack bytedeco's jar in any folder, add that folder onto your library path, and that would work.

I still prefer the system-wide method, for many reasons, but I appreciate that this is a huge issue for many, many Clojure and Java programmers who do not have experience with fiddling with native OS paths and whatnot. So, now we have the best of both worlds!

Thanks a lot!

joinr commented 4 years ago

@blueberry That's great to hear. I just tried the repo under windows 10 and ubuntu, both worked as expected (if you include the jar on the classpath, e.g. in dependencies), with only the require for uncomplicate.neanderthal.native being necessary. I think this greatly simplifies the options for getting running with neanderthal for many users in a crossplatform way, although maybe not all use cases want to bring in that heavy dependency and prefer to have a system level install.

BTW, you didn't need to register at Intel to acquire MKL. You could literally unpack bytedeco's jar in any folder, add that folder onto your library path, and that would work.

Hah, I realized this well after the fact. I was trying out neanderthal last year and went through this, before realizing just a few days ago that there were maven artifacts with the desired binaries.

Thanks a lot!

You are very welcome. Same to you for sharing your work!