huggingface / Google-Cloud-Containers

Hugging Face Deep Learning Containers (DLCs) for Google Cloud
https://hf.co/docs/google-cloud
Apache License 2.0
127 stars 16 forks source link

Fix setuptools install <70 #39

Closed philschmid closed 5 months ago

philschmid commented 5 months ago

The release of setuptools 70 broke older pytorch versions < 2.3.0

FYI: torch>=2.2.0 updates torch/utils/cpp_extension.py to no longer import pkg_resources. Therefore, this problem should not occur with the latest version of vLLM which uses torch==2.3.0.

This PR makes sure that we are using the exact Mamba version and not updating conda -> which updated setuptools

Fixes build for TGI < 2.0.3, which has Pytorch 2.3.0

 => ERROR [exllama-kernels-builder 3/3] RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py build                                                                                                                7.0s
------
 > [exllama-kernels-builder 3/3] RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py build:
3.031 /opt/conda/lib/python3.10/site-packages/torch/nn/modules/transformer.py:20: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /opt/conda/conda-bld/pytorch_1699449201336/work/torch/csrc/utils/tensor_numpy.cpp:84.)
3.031   device: torch.device = torch.device(torch._C._get_default_device()),  # torch.device('cpu'),
5.614 Traceback (most recent call last):
5.615   File "/usr/src/setup.py", line 2, in <module>
5.615     from torch.utils.cpp_extension import BuildExtension, CUDAExtension
5.615   File "/opt/conda/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 28, in <module>
5.616     from pkg_resources import packaging  # type: ignore[attr-defined]
5.616 ImportError: cannot import name 'packaging' from 'pkg_resources' (/opt/conda/lib/python3.10/site-packages/pkg_resources/__init__.py)
------
Dockerfile:126
--------------------
 124 |     COPY --from=tgi /tgi/server/exllama_kernels/ .
 125 |     
 126 | >>> RUN TORCH_CUDA_ARCH_LIST="8.0;8.6+PTX" python setup.py build
 127 |     
 128 |     # Build Transformers exllama kernels
--------------------
ERROR: failed to solve: process "/bin/sh -c TORCH_CUDA_ARCH_LIST=\"8.0;8.6+PTX\" python setup.py build" did not complete successfully: exit code: 1