triton-inference-server / fil_backend

FIL backend for the Triton Inference Server
Apache License 2.0
72 stars 36 forks source link

Error when compiling fil_backend #395

Closed nv-kmcgill53 closed 4 months ago

nv-kmcgill53 commented 5 months ago

Triton-Inference-Server is seeing an error when attempting to compile the fil_backend. Below is the error given by cmake:

[build-stage 3/4] RUN source /conda/dev/bin/activate  && cmake       --log-level=VERBOSE       -GNinja       -DCMAKE_BUILD_TYPE="Release"       -DBUILD_TESTS=""       -DTRITON_CORE_REPO_TAG="main"       -DTRITON_COMMON_REPO_TAG="main"       -DTRITON_BACKEND_REPO_TAG="main"       -DTRITON_ENABLE_GPU="ON"       -DTRITON_ENABLE_STATS="ON"       -DRAPIDS_DEPENDENCIES_VERSION="24.04"       -DTRITON_FIL_USE_TREELITE_STATIC="ON"       -DCMAKE_INSTALL_PREFIX=/rapids_triton/install       ..  && cd _deps/treelite-src  && git apply /rapids_triton/0001-Allow-predicting-with-FP32-input-and-FP64-models.patch  && cd ../..:

...

#32 43.85 CMake Error at build/_deps/raft-src/cpp/CMakeLists.txt:667 (target_link_libraries):
#32 43.85   The link interface of target "raft_distributed" contains:
#32 43.85 
#32 43.85     ucx::ucp
#32 43.85 
#32 43.85   but the target was not found.  Possible reasons include:
#32 43.85 
#32 43.85     * There is a typo in the target name.
#32 43.85     * A find_package call is missing for an IMPORTED target.
#32 43.85     * An ALIAS target is missing.
#32 43.85 
#32 43.85 
#32 43.85 
#32 43.85 CMake Generate step failed.  Build files cannot be regenerated correctly.

This failed our nightly build and looks to be related to the following change: https://github.com/triton-inference-server/fil_backend/pull/394

Let us know of any more information you need to troubleshoot, and thanks in advance for the help!

hcho3 commented 4 months ago

Can you post the Docker command I can use to reproduce the error?

krishung5 commented 4 months ago

Hi @hcho3, we use the below steps to build FIL backend:

git clone --recursive --single-branch --depth=1 -b main https://github.com/triton-inference-server/fil_backend.git fil

cd fil

mkdir build && cd build

cmake -DTRT_VERSION=10.1.0.27+cuda12.4.1.003 -DCMAKE_TOOLCHAIN_FILE= -DVCPKG_TARGET_TRIPLET= -DTRITON_FIL_DOCKER_BUILD:BOOL=ON -DTRITON_BUILD_CONTAINER=nvcr.io/nvidia/tritonserver:24.06-py3-min -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_REPO_ORGANIZATION:STRING=https://github.com/triton-inference-server -DTRITON_COMMON_REPO_TAG:STRING=main -DTRITON_CORE_REPO_TAG:STRING=main -DTRITON_BACKEND_REPO_TAG:STRING=main -DTRITON_ENABLE_GPU:BOOL=ON -DTRITON_ENABLE_MALI_GPU:BOOL=OFF -DTRITON_ENABLE_STATS:BOOL=ON -DTRITON_ENABLE_METRICS:BOOL=ON -DTRITON_ENABLE_MEMORY_TRACKER:BOOL=ON ..

cmake --build . --config Release -j256 -v -t install
hcho3 commented 4 months ago

@krishung5 Did you apply the patch 0001-Allow-predicting-with-FP32-input-and-FP64-models.patch to _deps/treelite-src ?

krishung5 commented 4 months ago

I don't think we did. Is the patch required to build on main or r24.07 branch? Let me try that meanwhile.

hcho3 commented 4 months ago

Yes. The patch is required. Does your build system not use https://github.com/triton-inference-server/fil_backend/blob/main/build.sh from this repo? The build.sh should have picked up the patch automatically.

hcho3 commented 4 months ago

Running this command

cmake -DTRT_VERSION=10.1.0.27+cuda12.4.1.003 -DCMAKE_TOOLCHAIN_FILE= -DVCPKG_TARGET_TRIPLET= -DTRITON_FIL_DOCKER_BUILD:BOOL=ON -DTRITON_BUILD_CONTAINER=nvcr.io/nvidia/tritonserver:24.06-py3-min -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX:PATH=`pwd`/install -DTRITON_REPO_ORGANIZATION:STRING=https://github.com/triton-inference-server -DTRITON_COMMON_REPO_TAG:STRING=main -DTRITON_CORE_REPO_TAG:STRING=main -DTRITON_BACKEND_REPO_TAG:STRING=main -DTRITON_ENABLE_GPU:BOOL=ON -DTRITON_ENABLE_MALI_GPU:BOOL=OFF -DTRITON_ENABLE_STATS:BOOL=ON -DTRITON_ENABLE_METRICS:BOOL=ON -DTRITON_ENABLE_MEMORY_TRACKER:BOOL=ON ..

invokes the Docker build with ops/Dockerfile, which already applies the patch automatically. So no need to manually apply the patch again.

hcho3 commented 4 months ago

Just tried the build command (cmake -DTRT_VERSION=...) on my end. I can't reproduce the error at all. Let me clean the Docker cache and try again.

hcho3 commented 4 months ago

@krishung5 Can you try this fix? #396

krishung5 commented 4 months ago

@hcho3 Confirmed that I was able to build fil_backend with the fix. Thanks for the quick fix! Could we bring this patch to the release branch as well?

hcho3 commented 4 months ago

Sure, let me bring it to the release branch.