FYI: torch>=2.2.0 updates torch/utils/cpp_extension.py to no longer import pkg_resources. Therefore, this problem should not occur with the latest version of vLLM which uses torch==2.3.0.
This PR makes sure that we are using the exact Mamba version and not updating conda -> which updated setuptools
Fixes build for TGI < 2.0.3, which has Pytorch 2.3.0
=> ERROR [exllama-kernels-builder 3/3] RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py build 7.0s
------
> [exllama-kernels-builder 3/3] RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py build:
3.031 /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /opt/conda/conda-bld/pytorch_1699449201336/work/torch/csrc/utils/tensor_numpy.cpp:84.)
3.031 device: torch.device = torch.device(torch._C._get_default_device()), # torch.device('cpu'),
5.614 Traceback (most recent call last):
5.615 File "/usr/src/setup.py", line 2, in <module>
5.615 from torch.utils.cpp_extension import BuildExtension, CUDAExtension
5.615 File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 28, in <module>
5.616 from pkg_resources import packaging # type: ignore[attr-defined]
5.616 ImportError: cannot import name 'packaging' from 'pkg_resources' (/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py)
------
Dockerfile:126
--------------------
124 | COPY --from=tgi /tgi/server/exllama_kernels/ .
125 |
126 | >>> RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py build
127 |
128 | # Build Transformers exllama kernels
--------------------
ERROR: failed to solve: process "/bin/sh -c TORCH_CUDA_ARCH_LIST=\"8.0;8.6+PTX\" python setup.py build" did not complete successfully: exit code: 1
The release of setuptools 70 broke older pytorch versions < 2.3.0
This PR makes sure that we are using the exact Mamba version and not updating conda -> which updated setuptools
Fixes build for TGI < 2.0.3, which has Pytorch 2.3.0