GPU not detected inside torchserve docker container

🐛 Describe the bug

I am trying to create a Docker image for my custom handler of diffusers. I can create the Docker image and then a Docker container from it, but the Docker container is not able to detect the GPU. I have used the official TorchServe Docker image from Docker Hub, but it still cannot use the GPU inside the container. I have also added --gpus all in the Docker container run command, but it still does not work.

How can I enable the GPU inside the container so that my custom handler can use it?

Error logs

WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-10-23T06:11:55,474 [DEBUG] main org.pytorch.serve.util.ConfigManager - xpu-smi not available or failed: Cannot run program "xpu-smi": error=2, No such file or directory
2024-10-23T06:11:55,498 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-10-23T06:11:55,513 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-10-23T06:11:55,560 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
2024-10-23T06:11:55,750 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.12.0
TS Home: /home/venv/lib/python3.9/site-packages
Current directory: /home/model-server
Temp directory: /home/model-server/tmp
Metrics config path: /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 12
Max heap size: 1966 M
Python executable: /home/venv/bin/python
Config file: /home/model-server/config.properties
Inference address: http://0.0.0.0:8080
Management address: http://0.0.0.0:8081
Metrics address: http://0.0.0.0:8082
Model Store: /home/model-server/model-store
Initial Models: all
Log dir: /home/model-server/logs
Metrics dir: /home/model-server/logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: /home/model-server/model-store
CPP log config: N/A
Model config: {"text-to-image": {"1.0": {"defaultVersion": true,"marName": "text-to-image.mar","minWorkers": 1,"maxWorkers": 1,"batchSize": 4,"maxBatchDelay": 5000,"responseTimeout": 120}}}
System metrics command: default
Model API enabled: true
2024-10-23T06:11:55,762 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-10-23T06:11:55,763 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: text-to-image.mar
2024-10-23T06:12:10,680 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model text-to-image
2024-10-23T06:12:10,681 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model text-to-image
2024-10-23T06:18:40,296 [INFO ] main org.pytorch.serve.wlm.ModelManager - Installed custom pip packages for model text-to-image
2024-10-23T06:18:40,297 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model text-to-image loaded.
2024-10-23T06:18:40,297 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: text-to-image, count: 1
2024-10-23T06:18:40,329 [DEBUG] W-9000-text-to-image_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [/home/venv/bin/python, /home/venv/lib/python3.9/site-packages/ts/model_service_worker.py, --sock-type, unix, --sock-name, /home/model-server/tmp/.ts.sock.9000, --metrics-config, /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml]
2024-10-23T06:18:40,334 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2024-10-23T06:18:40,443 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2024-10-23T06:18:40,444 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: EpollServerSocketChannel.
2024-10-23T06:18:40,446 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://0.0.0.0:8081
2024-10-23T06:18:40,446 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2024-10-23T06:18:40,458 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://0.0.0.0:8082
Model server started.
2024-10-23T06:18:40,741 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-10-23T06:18:41,407 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,408 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:909.3278884887695|#Level:Host|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,408 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:46.310420989990234|#Level:Host|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,409 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:4.8|#Level:Host|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,409 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:7.568359375|#Level:Host,DeviceId:0|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,409 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:620.0|#Level:Host,DeviceId:0|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,410 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:4.0|#Level:Host,DeviceId:0|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,410 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:6439.8203125|#Level:Host|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,410 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1127.33984375|#Level:Host|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,410 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:18.1|#Level:Host|#hostname:8a718bc19470,timestamp:1729664321
2024-10-23T06:18:41,824 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - s_name_part0=/home/model-server/tmp/.ts.sock, s_name_part1=9000, pid=59
2024-10-23T06:18:41,825 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Listening on port: /home/model-server/tmp/.ts.sock.9000
2024-10-23T06:18:41,833 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Successfully loaded /home/venv/lib/python3.9/site-packages/ts/configs/metrics.yaml.
2024-10-23T06:18:41,833 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - [PID]59
2024-10-23T06:18:41,833 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Torch worker started.
2024-10-23T06:18:41,834 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Python runtime: 3.9.20
2024-10-23T06:18:41,834 [DEBUG] W-9000-text-to-image_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-text-to-image_1.0 State change null -> WORKER_STARTED
2024-10-23T06:18:41,839 [INFO ] W-9000-text-to-image_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /home/model-server/tmp/.ts.sock.9000
2024-10-23T06:18:41,845 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Connection accepted: /home/model-server/tmp/.ts.sock.9000.
2024-10-23T06:18:41,847 [DEBUG] W-9000-text-to-image_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1729664321847
2024-10-23T06:18:41,849 [INFO ] W-9000-text-to-image_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1729664321849
2024-10-23T06:18:41,868 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - model_name: text-to-image, batchSize: 4
2024-10-23T06:18:42,034 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - OpenVINO is not enabled
2024-10-23T06:18:42,034 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - /home/model-server/tmp/models/624f26e469db467981f6f46a96687683/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 500: named symbol not found (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
2024-10-23T06:18:42,035 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG -   return torch._C._cuda_getDeviceCount() > 0
2024-10-23T06:18:42,035 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-10-23T06:18:42,035 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-10-23T06:18:42,209 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.
2024-10-23T06:18:42,212 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG -
2024-10-23T06:18:42,212 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - 0it [00:00, ?it/s]
2024-10-23T06:18:42,213 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - 0it [00:00, ?it/s]
2024-10-23T06:18:42,755 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Diffusers version 0.30.0
2024-10-23T06:18:42,755 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Model Directory: /home/model-server/tmp/models/624f26e469db467981f6f46a96687683
2024-10-23T06:19:05,544 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Device: cpu
2024-10-23T06:19:05,545 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Diffusion model extracted successfully
2024-10-23T06:19:05,546 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - /home/model-server/tmp/models/624f26e469db467981f6f46a96687683/torch/_utils.py:831: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
2024-10-23T06:19:05,546 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG -   return self.fget.__get__(instance, owner)()
2024-10-23T06:19:07,874 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - `local_files_only` is True but no local configs were found for this checkpoint.
2024-10-23T06:19:07,874 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Attempting to download the necessary config files for this pipeline.
2024-10-23T06:19:07,874 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG -
2024-10-23T06:19:08,710 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG -
2024-10-23T06:19:09,807 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Fetching 11 files:   0%|          | 0/11 [00:00<?, ?it/s]
2024-10-23T06:19:10,039 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Fetching 11 files:   9%|▉         | 1/11 [00:01<00:10,  1.10s/it]
2024-10-23T06:19:10,918 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Fetching 11 files:  55%|█████▍    | 6/11 [00:01<00:00,  5.71it/s]
2024-10-23T06:19:10,918 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Fetching 11 files:  82%|████████▏ | 9/11 [00:02<00:00,  4.43it/s]
2024-10-23T06:19:10,919 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Fetching 11 files: 100%|██████████| 11/11 [00:02<00:00,  4.98it/s]
2024-10-23T06:19:10,920 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG -
2024-10-23T06:19:11,006 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Loading pipeline components...:   0%|          | 0/6 [00:00<?, ?it/s]Some weights of the model checkpoint were not used when initializing CLIPTextModel:
2024-10-23T06:19:11,006 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG -  ['text_model.embeddings.position_ids']
2024-10-23T06:19:11,053 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG -
2024-10-23T06:19:11,253 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Loading pipeline components...:  67%|██████▋   | 4/6 [00:00<00:00, 30.12it/s]
2024-10-23T06:19:11,253 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Loading pipeline components...: 100%|██████████| 6/6 [00:00<00:00, 18.04it/s]
2024-10-23T06:19:11,254 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - You have disabled the safety checker for <class 'diffusers.pipelines.controlnet.pipeline_controlnet_img2img.StableDiffusionControlNetImg2ImgPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
2024-10-23T06:19:11,521 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference.
2024-10-23T06:19:11,532 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference.
2024-10-23T06:19:11,535 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference.
2024-10-23T06:19:11,537 [WARN ] W-9000-text-to-image_1.0-stderr MODEL_LOG - Pipelines loaded with `dtype=torch.float16` cannot run with `cpu` device. It is not recommended to move them to `cpu` as running them will fail. Please make sure to use an accelerator to run the pipeline in inference, due to the lack of support for`float16` operations on this device in PyTorch. Please, remove the `torch_dtype=torch.float16` argument, or use another device for inference.
2024-10-23T06:19:11,547 [INFO ] W-9000-text-to-image_1.0-stdout MODEL_LOG - Diffusion model from path /home/model-server/tmp/models/624f26e469db467981f6f46a96687683 loaded successfully
2024-10-23T06:19:11,548 [INFO ] W-9000-text-to-image_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 29699
2024-10-23T06:19:11,549 [DEBUG] W-9000-text-to-image_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-text-to-image_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-10-23T06:19:11,549 [INFO ] W-9000-text-to-image_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:31227.0|#WorkerName:W-9000-text-to-image_1.0,Level:Host|#hostname:8a718bc19470,timestamp:1729664351
2024-10-23T06:19:11,549 [INFO ] W-9000-text-to-image_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:3.0|#Level:Host|#hostname:8a718bc19470,timestamp:1729664351

Installation instructions

Yes I have installed torchserve from the source.

Model Packaging

I have followed this link to package the model locally.

config.properties

inference_address=http://0.0.0.0:8080 management_address=http://0.0.0.0:8081 metrics_address=http://0.0.0.0:8082 enable_envvars_config=true install_py_dep_per_model=true load_models=all model_store=/home/model-server/model-store

models={\ "text-to-image": {\ "1.0": {\ "defaultVersion": true,\ "marName": "text-to-image.mar",\ "minWorkers": 1,\ "maxWorkers": 1,\ "batchSize": 4,\ "maxBatchDelay": 5000,\ "responseTimeout": 120\ }\ }\ }

Versions

torchserve==0.12.0 torch-model-archiver==0.12.0

Python version: 3.10 (64-bit runtime) Python executable: D:\Text-to-Image\env\Scripts\python.exe

Versions of relevant python libraries: numpy==2.1.2 torch==2.4.1+cu118 torch-model-archiver==0.12.0 torchserve==0.12.0 torch==2.4.1+cu118 Warning: torchtext not present .. Warning: torchvision not present .. **Warning: torchaudio not present ..

Java Version:

OS: Microsoft Windows 11 Pro GCC version: N/A Clang version: N/A CMake version: version 3.27.9

Is CUDA available: Yes CUDA runtime version: 11.8.89 GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3050 Nvidia driver version: 537.58 cuDNN version: None

Repro instructions

My folder structure looks like following:

custom_handler.py
config.properties
Dockerfile
model.zip
requirements.txt

This is how many Dockerfile looks like:

FROM pytorch/torchserve:0.12.0-gpu

USER root

COPY . /home/model-server/

USER model-server

ENV NVIDIA_VISIBLE_DEVICES all ENV NVIDIA_DRIVER_CAPABILITIES compute,utility

EXPOSE 8080 EXPOSE 8081

RUN torch-model-archiver \ --model-name text-to-image \ --version 1.0 \ --handler /home/model-server/custom_handler.py \ --export-path /home/model-server/model-store \ --extra-files /home/model-server/model.zip -r requirements.txt

CMD ["torchserve", "--start", "--ts-config=/home/model-server/config.properties", "--disable-token-auth", "--enable-model-api"]

And these are commands that I run to build docker image:

1) docker build -t text-to-image . 2) docker run --name mycontainer --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -p 127.0.0.1:8082:8082 text-to-image

Possible Solution

No response

pytorch / serve