huggingface / text-generation-inference

Large Language Model Text Generation Inference
http://hf.co/docs/text-generation-inference
Apache License 2.0
8.34k stars 943 forks source link

converting docker images to singularity #702

Closed MiladInk closed 2 months ago

MiladInk commented 11 months ago

Feature request

I am trying to run tgi in a HPC cluster. I tried pulling the docker images with singularity. The problem is that in that case the custom kernels do not work and the cuda complains that PTX is build with another version of the toolchain. Is there any solution to this? I also wanted to build the image from the dockerfile but the transition from the dockerfile to singularity image is not straightforward.

What are my options? what is the problem with those custom kernels in my case?

Motivation

Many HPCs does not let the user run docker.

Your contribution

I can test the suggested methods to solve the problem and later on post the steps that worked for me.

YeechingTiger commented 11 months ago

I am also in HPC environment. "--disable-custom-kernels" works for me. However, I don't know what the impact would be.

Narsil commented 11 months ago

What kind of GPU is it ? H100 ? I'll look into this and see why it fails on some platforms. I'm guessing the kernels are built against an incompatible compute_arch

custom kernels shouldn't be necessary most of the time (there are only used for NEOX and BLOOM).

Narsil commented 11 months ago

Could be a duplicate of #739

Blair-Johnson commented 8 months ago

Feature request

I am trying to run tgi in a HPC cluster. I tried pulling the docker images with singularity. The problem is that in that case the custom kernels do not work and the cuda complains that PTX is build with another version of the toolchain. Is there any solution to this?

I also wanted to build the image from the dockerfile but the transition from the dockerfile to singularity image is not straightforward.

What are my options? what is the problem with those custom kernels in my case?

Motivation

Many HPCs does not let the user run docker.

Your contribution

I can test the suggested methods to solve the problem and later on post the steps that worked for me.

I've had success pulling the official docker image for my platform and then building a singularity image from the docker archive.

docker pull --platform amd64 xxx
docker save xxx -o xxx.tar
singularity build xxx.sif docker-archive://xxx.tar
rastna12 commented 8 months ago

I can confirm that disabling the custom kernels via the DISABLE_CUSTOM_KERNELS environment variable works for running an Apptainer container with an A100. If I do not have this flag set then I get the same CUDA error RuntimeError: CUDA error: the provided PTX was compiled with an unsupported toolchain.. My full CLI command to start of my container with a local LLM model is:

apptainer run \ --nv \ --bind $volume:/data \ --env DISABLE_CUSTOM_KERNELS=true \ hf_text_generation_inference_v110.sif --model-id ./data/models--bigcode--starcoderbase-3b/snapshots/e1c5ef4ebb97afa0db09ec3e520f0487ca350bbe/ --port 8000 I imagine that it would work the same for a Singularity container.

Fwiw, my driver and CUDA settings are: NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7

OlivierDehaene commented 8 months ago

It's possible that your driver is too old. As you can see, it supports Cuda version up to 11.7 while TGI is using 11.8.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.