triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

did fastertransformer support version nvcr.io/nvidia/tritonserver:21.07-py3 #68

Closed changleilei closed 1 year ago

changleilei commented 1 year ago

Description

branch :https://github.com/triton-inference-server/fastertransformer_backend.git main
docker version : nvcr.io/nvidia/tritonserver:21.07-py3
GPU: Tesla V100

when i flow the steps to build fastertransformer backend, an error happend:

451 BackendModel(TRITONBACKEND_Model* triton_model); 
/workspace/build/_deps/repo-backend-src/include/triton/backend/backend_model.h:45:3: note: candidate expects 1 argument, 2 provided 
[100%] Linking cxX executable../../../../../bin/multi_gpu_gpt_triton_example
[100%] Built target multi_gpu_gpt_triton_example
make[2]:***[CMakeFiles/triton-fastertransformer-backend.dir/build.make:82:CMakeFiles/triton-fastertransformer-backend.dir/src/libfastertransforner.cc.o] Error 1 make[1]:***[CMakeFiles/Makefile2:1457:CMakeFiles/triton-fastertransformer-backend.dir/all] Error 2 make[1]: *** Waiting for unfinished jobs....
/workspace/build/_deps/repo-ft-src/src/fastertransformer/utils/nccl_utils.h:In function'bool test_context_sharing(const strings, const strings) [with T = float]'
/workspace/build/_deps/repo-ft-src/src/fastertransformer/utils/ncclutils.h:72:144:warning:'pipeline_para.fastertransformer::NcclParam:inccl_uid’may be used uninitiali ed in this function [-Wmaybe-uninitialized]
721 NcclParam(NcclParam const& param) 
/workspace/bui1d/_deps/repo-ft-src/src/fastertransformer/utils/nccutils.h:72:144:warning:'tensor_para.fastertransformer::NcclParam::nccl_uid_’ may be used uninitialize in this function [-Wmaybe-uninitialized]
721 NcclParam(NcclParam const& param): 
A
/
[100%] Linking CXX executable ../../../../bin/test_context_decoder_layer
[100%] Built target test_context_decoder_layer
[100%] Linking CXX executable ../../..../bin/test_sampling
[100%] Built target test_sampling
make: *** [Makefile:149: all] Error 2

Reproduced Steps

Reproduced steps:
1. 
git clone https://github.com/triton-inference-server/fastertransformer_backend.git
cd fastertransformer_backend
export WORKSPACE=$(pwd)
export CONTAINER_VERSION=21.07
export TRITON_DOCKER_IMAGE=triton_with_ft:${CONTAINER_VERSION}

2. docker run -it \
    -v ${WORKSPACE}:/workspace \
    --name ft_backend_builder \
    nvcr.io/nvidia/tritonserver:21.07-py3  bash
3. rm /opt/tritonserver/lib/cmake/FasterTransformer/ -rf # Remove original library
cd fastertransformer_backend

4. apt-key del 7fa2af80 && \
    apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub && \
    apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu2004/x86_64/7fa2af80.pub && \
    apt-get update && \
    apt-get install -y --no-install-recommends openssh-server zsh tmux mosh locales-all clangd sudo \
        zip unzip wget build-essential autoconf autogen gdb python3.8 python3-pip python3-dev rapidjson-dev \
        xz-utils zstd libz-dev && \
    pip3 install torch==1.9.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html  && \
    pip3 install --extra-index-url https://pypi.ngc.nvidia.com regex fire tritonclient[all] && \
    pip3 install transformers huggingface_hub tokenizers SentencePiece sacrebleu && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

export CMAKE_VERSION=3.18 && \
export CMAKE_BUILD=3.18.4 && \
    wget -nv https://cmake.org/files/v${CMAKE_VERSION}/cmake-${CMAKE_BUILD}.tar.gz && \
    tar -xf cmake-${CMAKE_BUILD}.tar.gz && \
    cd cmake-${CMAKE_BUILD} && \
    ./bootstrap --parallel=$(grep -c ^processor /proc/cpuinfo) -- -DCMAKE_USE_OPENSSL=OFF && \
    make -j"$(grep -c ^processor /proc/cpuinfo)" install && \
    cd /workspace/build/ && \
    rm -rf /workspace/build/cmake-${CMAKE_BUILD}
5. mkdir build -p && cd build && \
    cmake \
      -D CMAKE_EXPORT_COMPILE_COMMANDS=1 \
      -D CMAKE_BUILD_TYPE=Release \
      -D CMAKE_INSTALL_PREFIX=/opt/tritonserver \
      -D TRITON_COMMON_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" \
      -D TRITON_CORE_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" \
      -D TRITON_BACKEND_REPO_TAG="r${NVIDIA_TRITON_SERVER_VERSION}" \
      -D 
      .. && \
    make -j"$(grep -c ^processor /proc/cpuinfo)" install
byshiue commented 1 year ago

Latest ft backend requires newer tritonserver docker image because several new features are only supported in newer triton.

We adjust using the triton docker we recommend in the document.

changleilei commented 1 year ago

Thank you !!