turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs
MIT License
3.2k stars 236 forks source link

Prebuilt CUDA wheels no longer run on Ubuntu 20.04 #338

Closed jriesen closed 3 months ago

jriesen commented 4 months ago

On an Ubuntu 20.04 system, exllamav2 refuses to load a model in a new (cloned from main) installation of oobabooga-textgen-webui, giving the error:

ImportError: /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_2.32' not found (required by /home/user/text-generation-webui/installer_files/env/lib/python3.11/site-packages/exllamav2_ext.cpython-311-x86_64-linux-gnu.so)

Ubuntu 20.04 has glibc 2.31, so it's evident the wheel has been built with too-new of an operating system. The wheels that textgen uses in its requirements.txt are built with its own nearly-identical downstream fork of this project, so I figured I would create this issue here on your originating upstream rather than bother them and get sent here.

It looks like commit 8be8867 changed the build-wheels-release.yml GitHub Action to only use ubuntu-latest, breaking support for earlier operating systems, including Ubuntu 20.04.

I am of course aware that Ubuntu 22 exists and should get around to upgrading this OS at some point, but Ubuntu 20.04 isn't that old. Is dropping support for Ubuntu < 20.10 official and/or intentional for the prebuilt CUDA wheels? Especially since build-wheels-release-rocm.yml was not similarly modified and still builds with 20.04?

(I can, of course, build them from source myself against the appropriate CUDA version etc. but I wanted to verify this wasn't accidental before either going down that path or giving up and updating to Ubuntu 22.)

elvenwhite commented 4 months ago

Same issue here. I don't believe it's intentional to drop support for Ubuntu 20.04.

Ph0rk0z commented 4 months ago

In the meantime you should build. It's really easy on linux.

turboderp commented 4 months ago

It's not intentional, no. I can look into what it would take to support Ubuntu 20.04, and I'm not against building on that if the resulting binaries would still work on later versions. It's just really hard to stay ahead of all these breaking changes, and I just don't have the time to maintain 30 different setups to test all the wheels for every release.

I'd happily take contributions from someone more familiar with the Ubuntu ecosystem. The build actions are kind of a mess anyway and could use an overhaul.

Building from source is also pretty easy though, especially on Linux. Just install the CUDA Toolkit, and then pip install . from the repo.

neggles commented 4 months ago

@turboderp Linux stuff is always forward-compatible, and rarely backward-compatible. For example, most prebuilt python wheels are built using the manylinux2014 "standard" which is basically the glibc and other system libraries that were available in Ubuntu 14.04 / CentOS 5 etc. This makes those wheels run just fine on pretty much any vaguely modern Linux platform, since they won't try to use any features of glibc etc. that were only present in later versions.

In a similar vein, if you build on Ubuntu 20.04 LTS you can expect those binaries to run on any later version of Ubuntu; as such, for stuff like this, best practice is to use the oldest currently-supported Ubuntu LTS version for builds. If you build on 20.04, it'll run on 20.04, 22.04, current 23.10, and upcoming 24.04 as well.

This usually works out for non-Ubuntu distros too, since the oldest supported Ubuntu LTS is usually just about the oldest set of system libraries that are actually in use across a meaningful number of systems. There won't be any meaningful performance differences when building on 20.04 vs 22.04 for this kind of code either since the overwhelming majority of the work being done is CUDA shenanigans.

tl;dr "build on 20.04 and it'll run wherever"

Nixellion commented 4 months ago

This also affects Debian 11, which is for now still the primary Debian version that people might want to use on servers. Debian 12 exists, but not everyone moved yet because of potential compatibility issues with running software.

On top of that I don't think Debian 12 updated glibc yet. Debian 11 is on on GLIBC_2.31, and I believe Debian 12 might also still use that.

So you'd be dropping more than Ubuntu 20.04, you'd also be dropping entire Debian branch.

Omegastick commented 3 months ago

As of https://github.com/turboderp/exllamav2/pull/360 this should be fixed.