Open PhilipVinc opened 2 weeks ago
Downloading the file and trying to unzip it manually also fails. I think they are corrupted.
@rahulbatra85 FYI.
Also, while I'm pinging you, I'll note that we're going to drop support for monolithic CUDA jaxlib wheels in the next release, in favor of plugin wheels. ROCM should switch or your build will break...
@hawkinsp Thanks for the ping.Yeah, I saw it in the release notes. We are working on pushing out changes for ROCm PjRT plugin.
Thanks!
@rahulbatra85 if you are changing your build infrastructure, can I give another feedback?
Your wheels ARE NOT manylinux2014 compliant, even if you tag them as such! Manylinux2014 means that you should require GLIBC/GLIBCXX from 2014 (circa glibc 2.14), but instead your wheels link to to the relatively recent GLIBCXX_3.4.26 and GLIBC_2.29. (I tested the most recent working one, jaxlib-0.4.26+rocm610-cp311-cp311-manylinux2014_x86_64.whl )
This 1) means your wheels are not compliant, and 2) make it very hard to run on HPC environments.
I've been struggling for the last few months to run Jax on (France's) Cray HPC Hardware with AMD GPUs, and it's really a pain. A few releases ago you bumped those GLIBC and GLIBCXX and now it's hard to get it running at all.
Your wheels ARE NOT manylinux2014 compliant, even if you tag them as such!
We are aware of this and are working to fix it for the JAX builds. Currently these wheels are most likely ubuntu 20.04+ compliant, since I believe that is what they are being created with.
Downloading the file and trying to unzip it manually also fails. I think they are corrupted.
It appears that only two of the 6 wheels in the release have this issue:
I have checked the others and they seem to work properly, so those could be used instead as a workaround for now.
I may need to rebuild the busted wheels and re-upload as I am not sure what caused them to become unusable.
I recommend running auditwheel
in CI and verifying the tag you expect is the tag you get.
Thanks! In order to plan ahead, do you have a timeline for fixing the manylinux compliance?
Re-uploaded working versions of:
Thanks! In order to plan ahead, do you have a timeline for fixing the manylinux compliance?
I'm not sure what the ETA on that would be yet. We are doing that as part of the update to manylinux_2_28, since manylinux2014 is out of support in a couple months I believe.
I recommend running auditwheel in CI and verifying the tag you expect is the tag you get.
auditwheel
is definitely part of our real manylinux builds with other frameworks, but our JAX stuff came into that work late :/ It's definitely something in the pipeline however.
@PhilipVinc Out of curiosity, what glibc/libstdc++ version can you support?
We're wondering to what standard to bump JAX's main releases, and two options are: a) manylinux_2_28 (glibc 2.28, glibcxx 3.4.24) b) manylinux_2_31 (glibc 2.31, glibcxx 3.4.28).
You noted that AMD's wheels were too new for you, so I'm curious what standards you can accept and whether it's possible to upgrade. Is 2_28 possible? I think that's the likely outcome.
Hey @hawkinsp . I will double check tomorrow but IIRC the Cray HPC system with AMD GPUs (France's 2nd largest HPC) has glibc 2.28 at most.
If you want, I will double check other big European HPC clusters as well to give you some datapoints, but in general do expect them to be outdated.
I'm sorry to state that, but they are on average slow to upgrade, so it's unfortunately on you in some sense to be conservative.
manylinux_2_28 roughly corresponds to a release from Aug 2018, and I'm tempted to say "6 years is a long enough support window". It also happens to be the next newest version at which the manylinux project has docker images: https://github.com/pypa/manylinux, so I'd expect wide adoption as soon as the manylinux2014 CentOS reaches end of life.
So I checked on the clusters I have access to:
From a quick chat with the support teams, it seems that the problem is that Cray is very slow to release updated version of their custom software stack, while the other vendor is less constraining on them.
I agree with you that 6 years is long enough support window.
I think if you go with manylinux_2_28
it should be ok for the vast majority of users.
Description
The wheel files distributed at https://github.com/ROCm/jax/releases are invalid. See error
System info (python version, jaxlib version, accelerator, etc.)
(jax-env) [cad14908] fvicentini@login5:~$ pip --version pip 24.0 from /lus/home/CT5/cad14908/fvicentini/jax-env/lib/python3.11/site-packages/pip (python 3.11) (jax-env) [cad14908] fvicentini@login5:~$ python --version Python 3.11.5