Open 18810251126 opened 1 year ago
Please provide your reproduced steps.
docker run -it --rm --gpus=all --shm-size=1g --ulimit memlock=-1 -v ${WORKSPACE}:${WORKSPACE} -w ${WORKSPACE} ${TRITON_DOCKER_IMAGE} bash
export WORKSPACE=$(pwd)
sudo apt-get install git-lfs git lfs install git lfs clone https://huggingface.co/bert-base-uncased # Download model from huggingface git clone https://github.com/NVIDIA/FasterTransformer.git # To convert checkpoint export PYTHONPATH=${WORKSPACE}/FasterTransformer:${PYTHONPATH} python3 FasterTransformer/examples/pytorch/bert/utils/huggingface_bert_convert.py \ -in_file bert-base-uncased/ \ -saved_dir ${WORKSPACE}/all_models/bert/fastertransformer/1/ \ -infer_tensor_para_size 2
sudo pip3 install torch==1.12.1+cu116 torchvision==0.10.0+cu111 torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html /workspace/build/fastertransformer_backend/build/bin/bert_gemm 32 32 12 64 1 0 2
cp config.pbtxt ${WORKSPACE}/all_models/bert/fastertransformer/ CUDA_VISIBLE_DEVICES=0,1 mpirun -n 1 --allow-run-as-root /opt/tritonserver/bin/tritonserver --model-repository=${WORKSPACE}/all_models/bert/ &
Reference link: https://github.com/triton-inference-server/fastertransformer_backend/blob/main/docs/bert_guide.md
What's the version of your docker image?
TRITON_VERSION=22.03, Docker version 20.10.17, build 100c701
Can you try 22.12, which is recommended in document?
This issue doesn't seem to be a version issue
Description
Reproduced Steps