jcuda / jcuda-main

Summarizes the main JCuda libraries
MIT License
98 stars 20 forks source link

JCuda 11.5.2 binary incompatibility #45

Closed corepointer closed 2 years ago

corepointer commented 2 years ago

Hi!

The Linux binaries of JCuda 11.5.2 seem to be incompatible with at least Ubuntu 20.04 (latest LTS) and Debian 11. They seem to have been compiled on a more recent system with a newer glibc. If I compile from source, everything works fine. If I use the ones downloaded by maven, I get the error:

java.lang.UnsatisfiedLinkError: /tmp/libJCublas2-11.5.2-linux-x86_64.so: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.3
4' not found (required by /tmp/libJCublas2-11.5.2-linux-x86_64.so) 

Glibc version in Ubuntu20/Debian11 is 2.31:

/lib/x86_64-linux-gnu/libc.so.6 
GNU C Library (Debian GLIBC 2.31-12) stable release version 2.31.
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 10.2.1 20210110.
libc ABIs: UNIQUE IFUNC ABSOLUTE
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
jcuda commented 2 years ago

I have to ask @blueberry here: Is there any information about the Linux version that you are using, and the installed GLIBC version? (There had been a similar issue a few years ago ( https://github.com/jcuda/jcuda-main/issues/10 ), but it's probably just a general version mismatch...)

blueberry commented 2 years ago

I'm using Arch linux with glibc 2.35. (I guess that's the default there, since I don't remember explicitly changing it).

jcuda commented 2 years ago

As usual, I'm not sure about common workflows on Linux: Is it reasonable for people to update their installed GLIBC, or is this so much a "core" library that it should usually not be touched by users?

blueberry commented 2 years ago

As with any other library, you can use newer libc in a specific process, by setting up one or two environment variables (for your process only or whatever you like).

corepointer commented 2 years ago

The C library (glibc in most cases for Linux) is a core component of any system, as all programs rely on in in some form. If you mess with it your system is gone (been there, done that). Requiring to install another glibc and set up the right environment is not something I would ask from an ordinary user. Now the question is if the newer can run the older - compiling for glibc 2.31 and running on 2.34+.

blueberry commented 2 years ago

That is correct, but I'm not talking of installing glibc anywhere where it could be mistaken for your original version of glibc. I'm simply talking about putting glibc.so (or whatever binary file is appropriate) in a directory specific to a project and including it in LD_LIBRARY_PATH (or, again, whatever else is appropriate) for that process only, not globally. I don't see how that mess up the system.

An ordinary user of JCuda is, I believe, a programmer, which should be able to handle something like this. An ordinary user of that programmer's program is not a programmer and is probably not able to do this, but, OTOH, that user is probably not able to use even java -jar theprogram.jar, but relies on running a shell script, which could set up the process to use the appropriate glibc shipped inside the program's directory?

I agree that it would be best if this was compiled with a glibc version that would work on all systems. There's only a question then which versions to support, as not every LTS of every distribution is at the same version of glibc.

blueberry commented 2 years ago

Maybe the best solution would be to build the awaited version 11.6.0 on Ubuntu LTS?

corepointer commented 2 years ago

I agree, setting this up can be asked of a programmer who uses jcuda. But the situation is far from ideal. It unnecessarily blows up package size and what if other native shared libraries that are used concurrently in some java program require the original glibc? Trying this out on Ubuntu20/Debian11 (same glibc version) seems reasonable imho. A majority of desktop users will use this or something newer. Server users tend to be conservative with upgrades but can be expected to handle containerized setups (docker/singularity/etc). Using singularity myself on some hpc systems. That works quite well :+1:

jcuda commented 2 years ago

From my understanding (and please correct me if I'm wrong), there are roughly these options;

(And both of these options are not something that a user should have to do. A user should just declare the Maven dependency and be done...)

I'm not sure what would be necessary for the latter. I could imagine that ( @blueberry ) it could be technically possible to create a 11.5.2b release with the older dependency. You'd probably have to manually unpack this older GLIBC version somewhere and declare it as dependency in CMake or so. Whether or not that's worth the effort is hard to say. But... I assume that something similar would have to be done to avoid the problem in 11.6.0 as well.

compiling for glibc 2.31 and running on 2.34+.

Iff these version numbers roughly follow 'semantic versioning', then this should be fine, and one could probably say that the dependency to GLIBC should be ~"the oldest version that works". If it was possible to compile this with, say, version 2.01 and this version still worked for someone who had installed 2.34, the problem could be solved.

Still, I'm not sure how exactly someone with a "modern" Linux could configure the build so that it only requires an "old" GLIBC version...

blueberry commented 2 years ago

Yes, if necessary I'll compile it with older glibc that I can put at the appropriate directory inside a jcuda build infrastructure, but I don't know how to set cmake to do that, as I find it utterly confusing.

blueberry commented 2 years ago

OTOH, the next Ubuntu LTS 22.04 is due in April, I guess it'll use current glibc, so I think this issue already solved itself...

corepointer commented 2 years ago

That might be true for desktop systems. On cluster environments I have to put up with Debian11, Ubuntu20, RHEL8, CentOS7. But as discussed, server users should know how to set up their environment. On these systems the bigger problem is to get recent drivers installed (which the admins of these systems are very reluctant to do, so I'm stuck with CUDA 11.2 - problem solved) :face_exhaling:

jcuda commented 2 years ago

(Sorry for the delay).

Admittedly, I'm not entirely sure how to proceed here.

To my understanding, it would be preferable to have the library compiled with a GLIBC that is ~"as old as possible to work, and in a version that common Linux distributions are downward compatible with". But I don't know the details about how that could be achieved technically.

If somebody can provide JCuda binaries for 11.5.2 that are compiled with such an "older" GLIBC version, then I could try to bundle this and publish it as 11.5.2b.

The same question will come up in view of https://github.com/jcuda/jcuda-main/issues/44 - maybe compiling the 11.6.1 binaries with such an "old" GLIBC could solve the issue for most users.

Any ideas?

(In doubt, I could also try to revive my Linux VM and see whether I can compile the binaries with an older version there, but that tends to be a pain in the back, and I'll never be able to test these binaries in a VM...)

blueberry commented 2 years ago

@jcuda I can at least test these binaries...

corepointer commented 2 years ago

As I've really become a fan of singularity lately, I can compile & test different versions quite flexibly. I'll be in need of binaries for CUDA 11.2 and 11.3 as well, because some cluster environments I am using have older divers that don't support newer CUDA SDKs. Haven't tested yet, if the current binaries work there. I can help out with compile & test if you want (potentially with some delay though). How would these binaries be delivered/published?

jcuda commented 2 years ago

@corepointer The binaries for 11.2 should already be available, as of https://github.com/jcuda/jcuda-main/issues/39#issuecomment-823521412 (11.3 was essentially skipped - it's sometimes hard to allocate the time to make reqular, quick and frequent updates...). I don't know whether the binaries for 11.2 have the same GLIBC issue, but it would be good to know.

How would these binaries be delivered/published?

The build instructions at https://github.com/jcuda/jcuda-main/blob/master/BUILDING.md describe the overall process. The result of that process should be a bunch of JAR files in the jcuda-main/output directory that include the JARs with the Linux-specific native libraries in them - these JARs are all that I need in order to create the Maven release.

jcuda commented 2 years ago

I'll close that for now. We know the reason for the incompatibility, and how to 'solve' it. (Essentially: Build JCuda with the 'oldest' version of GLIBC that can be used). One compatibility release for 16.1 was released in https://github.com/jcuda/jcuda-main/issues/52 . I'll probably not create compatibility releases for older versions, but we can keep an eye on the GLIBC version for future releases. Thanks again @corepointer for the binaries provided in https://github.com/jcuda/jcuda-main/issues/51#issuecomment-1164390291