Open lkm2835 opened 1 year ago
Hi, I'm following the setup guide.
I found a bug and solved it.
https://github.com/triton-inference-server/fastertransformer_backend#setup
docker run -it \ --shm-size=1g --ulimit memlock=-1 \ -v ${WORKSPACE}:/workspace \ --name ft_backend_builder \ ${TRITON_DOCKER_IMAGE} bash ...
and https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-serving-on-single-node
/workspace/build/fastertransformer_backend/build/bin/gpt_gemm 8 1 32 16 64 4096 50257 1 1 1
->
[FT][INFO] Arguments: [FT][INFO] batch_size: 8 [FT][INFO] beam_width: 1 [FT][INFO] max_input_len: 32 [FT][INFO] head_num: 16 [FT][INFO] size_per_head: 64 [FT][INFO] inter_size: 4096 [FT][INFO] vocab_size: 50257 [FT][INFO] data_type: 1 [FT][INFO] tensor_para_size: 1 [FT][INFO] is_append: 1 terminate called after throwing an instance of 'std::runtime_error' what(): [FT][ERROR] CUDA runtime error: CUDA driver version is insufficient for CUDA runtime version /workspace/build/fastertransformer_backend/build/_deps/repo-ft-src/src/fastertransformer/models/multi_gpu_gpt/gpt_gemm.cc:74 Aborted (core dumped)
This need to fix docker to nvidia-docker.
docker
nvidia-docker
nvidia-docker run -it \ --shm-size=1g --ulimit memlock=-1 \ -v ${WORKSPACE}:/workspace \ --name ft_backend_builder \ ${TRITON_DOCKER_IMAGE} bash
reference: https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#prepare
Thank you for the feedback. There are many ways to use GPUs in docker, and we assume this is an environment setting for customer side because not everyone installs the nvidia-docker.
Hi, I'm following the setup guide.
I found a bug and solved it.
https://github.com/triton-inference-server/fastertransformer_backend#setup
and https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/gpt_guide.md#run-serving-on-single-node
->
This need to fix
docker
tonvidia-docker
.reference: https://github.com/NVIDIA/FasterTransformer/blob/main/docs/gpt_guide.md#prepare