triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version on FT #111

Open lkm2835 opened 1 year ago

lkm2835 commented 1 year ago

Hi, I'm following the setup guide.

I found a bug and solved it.

https://github.com/triton-inference-server/fastertransformer_backend#setup

docker run -it \
    --shm-size=1g --ulimit memlock=-1 \
    -v ${WORKSPACE}:/workspace \
    --name ft_backend_builder \
    ${TRITON_DOCKER_IMAGE} bash

...

and https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-serving-on-single-node

/workspace/build/fastertransformer_backend/build/bin/gpt_gemm 8 1 32 16 64 4096 50257 1 1 1

->

[FT][INFO] Arguments:
[FT][INFO]   batch_size: 8
[FT][INFO]   beam_width: 1
[FT][INFO]   max_input_len: 32
[FT][INFO]   head_num: 16
[FT][INFO]   size_per_head: 64
[FT][INFO]   inter_size: 4096
[FT][INFO]   vocab_size: 50257
[FT][INFO]   data_type: 1
[FT][INFO]   tensor_para_size: 1
[FT][INFO]   is_append: 1

terminate called after throwing an instance of 'std::runtime_error'
  what():  [FT][ERROR] CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/models/multi_gpu_gpt/gpt_gemm.cc:74

Aborted (core dumped)

This need to fix docker to nvidia-docker.

nvidia-docker run -it \
    --shm-size=1g --ulimit memlock=-1 \
    -v ${WORKSPACE}:/workspace \
    --name ft_backend_builder \
    ${TRITON_DOCKER_IMAGE} bash

reference: https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#prepare

byshiue commented 1 year ago

Thank you for the feedback. There are many ways to use GPUs in docker, and we assume this is an environment setting for customer side because not everyone installs the nvidia-docker.