uncomplicate / neanderthal

Fast Clojure Matrix Library
http://neanderthal.uncomplicate.org
Eclipse Public License 1.0
1.06k stars 56 forks source link

"minimal" example Dockerfile #128

Closed behrica closed 4 months ago

behrica commented 2 years ago

It setups MKL, CUDA and openCL based on Ubuntu 20.04 Dockerimage.

Maybe worth to keep and mention as an additional way of setting up of neanthertal.

I went for CUDA 11.4, so we need overwrite jcuda deps, as shown in deps.edn.minimal

blueberry commented 2 years ago

I am pretty sure cuda 11.4.1 will not work with Deep Diamond, as Nvidia changes API from time to time and several neural networks functions that I use (and perhaps many) simply do not exist in 11.4, or have different signature, or plainly work different under the hood. Neanderthal could work, but who knows. ClojureCUDA should work with 11.4. without problems, but I can't be sure.

behrica commented 2 years ago

I thought that deep diamond is using neanderthal only.

So "all tests passing" is not enough as a test ?

behrica commented 2 years ago

Cuda 11.6 has this CLIB issue we talked about before.

I still think it is useful to keep it "somewhere", as reference if somebody is struggling to set things up. (including for myself)

Maybe we can keep the idea of one or more "example Dockerfiles". This is for CUDA 11.4, all tests passisng.

I can try again with CUDA 11.6, as an other "example". So as a kind of "living instructions". Not sure you find that useful.

blueberry commented 2 years ago

As far as it helps any user, I find it useful. I'm just afraid that it can become a complex, but broken solution, to a much simpler problem. If 11.6 does not work due to old glibc the right solutions are:

1) update the operating system to a more recent version with recent glibc, or, if that is not possible for whatever reason, 2) build JCuda itself on the system with older glibc, and contribute that build upstream to JCuda.

behrica commented 2 years ago

I will give it a try with CUDA 11.6. Its just a few changes in the Dockerfile, so we will see quickly if that works out.

Going to Ubuntu 22.04, means indeed CUDA 11.7, for which there is not (yet) a JCuda release.

At least I am learning a lot ...

behrica commented 2 years ago

Maybe starting from here is even better:

https://hub.docker.com/layers/cuda/nvidia/cuda/11.6.1-runtime-ubuntu20.04/images/sha256-b59497e63c4d8cefac1152ceeb564830ed2f46e7d417c822a5813464a10394d2?context=explore

blueberry commented 2 years ago

It should be possible to instalall an earlier CUDA 11.6 on Ubuntu 22.04, it's just not the default. 11.7 came out fairly recently, I bet most of the cuda-dependent software that people run on ubuntu still needs earlier versions, so earlier versions should be available.

behrica commented 2 years ago

The "official" NVIDIA downloads do not have it: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/

behrica commented 2 years ago

Ok, now its cristal clear. JCuda 11.6 is not working on ubuntu 20.04 (and likely on a lot of other distributions)

Using the modified Dockerfile (and the NVIDA Dockerimage) give:

/tmp/libJCudaDriver-11.6.1-linux-x86_64.so: /usr/lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /tmp/libJCudaDr iver-11.6.1-linux-x86_64.so)

As it was hinted before.

So Arch users are lucky, because we have (GNU libc) 2.35

blueberry commented 2 years ago

The "official" NVIDIA downloads do not have it: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/

They do have it, as there is a separate vanilla linux distribution that covers distributions other than ubuntu and fedora, and these should be compatible with all linux distros, including ubuntu and fedora. The installer is run as a shell script, and arch linux and other distros not officially supported by nvidia wrap this from their package managers (Arch Linux as well).

behrica commented 4 months ago

outdated now