Closed Kaiyang-Chen closed 3 months ago
Have you considered using FIL with Triton server? It offers a convenient method to serve tree models with CPUs and GPUs, and you can easily switch between them.
Have you considered using FIL with Triton server? It offers a convenient method to serve tree models with CPUs and GPUs, and you can easily switch between them.
Hi, Thanks for your reply. I had looked into that. What I want is actually not a inference backend, the fil inference is only a small part of my workflow and I don't want it to include any extra socket handling to the inference backend like triton. I don't see there's a clear c_api for fil_backend, or do you mean I should go through the code and leverage the api there?
By the way, from the CPU inference performance pespective, is fil much better than treelite(tl2cgen)? Or they behave similarly right now?
Do we have a convenient way to install the cpu-only build for cuml? If not, after I build the CPU only cuml, how should I integrate it back to the rapids project (link them together)?
Currently, the CUML_ENABLE_GPU
option is recognized only by FIL. Even with CUML_ENABLE_GPU
specified, the build script will still attempt to build *.cu
files because this flag is not properly recognized by other submodules in cuML.
So for best results, you should also add -DCUML_ALGORITHMS=FIL
in order to exclude submodules other than FIL.
Build FIL, CPU only:
cmake -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_BUILD_TYPE=Release \
-DBUILD_CUML_C_LIBRARY=ON -DCUML_ALGORITHMS=FIL -DBUILD_CUML_TESTS=OFF \
-DBUILD_CUML_MPI_COMMS=OFF -DBUILD_CUML_MG_TESTS=OFF \
-DCUML_USE_TREELITE_STATIC=OFF -DNVTX=OFF -DUSE_CCACHE=OFF \
-DDISABLE_DEPRECATION_WARNINGS=ON -DCMAKE_PREFIX_PATH=$CONDA_PREFIX \
-DCUML_ENABLE_GPU=OFF ..
make
(assuming that you have a Conda environment set up according to BUILD.md)
Build FIL, with GPU code:
cmake -DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DCMAKE_BUILD_TYPE=Release \
-DBUILD_CUML_C_LIBRARY=ON -DCUML_ALGORITHMS=FIL -DBUILD_CUML_TESTS=OFF \
-DBUILD_CUML_MPI_COMMS=OFF -DBUILD_CUML_MG_TESTS=OFF \
-DCUML_USE_TREELITE_STATIC=OFF -DNVTX=OFF -DUSE_CCACHE=OFF \
-DDISABLE_DEPRECATION_WARNINGS=ON -DCMAKE_PREFIX_PATH=$CONDA_PREFIX \
-DCUML_ENABLE_GPU=ON ..
make
For my cpp implementation, is it enough to state my execution on CPU by using
raft_proto::device_type::cpu
is each function call? Or other thing like some config or environment variables are needed?
Specifying raft_proto::device_type::cpu
should be enough.
I saw the cpu inference code are intergrated from treelite, is there any further performance improvment right now? Is performance boosting for fil CPU inference on the road map for the team? from the CPU inference performance pespective, is fil much better than treelite(tl2cgen)?
Currently, FIL offers an "experimental" implementation of CPU inference. FIL used to be GPU only but we are in the process of switching to a unified implementation that can use both CPU and GPU. I won't claim state of the art CPU performance for the experimental FIL, since the goal is to enable users to seamlessly switch between CPUs and GPUs in their inference stack.
the libcuml.so and libcumlprims_mg.so are from original rapidsai conda environment, and I will face problem as
I just tried building cuML with -DCUML_ALGORITHMS=FIL -DCUML_ENABLE_GPU=OFF
flags and got a minimal libcuml++.so
that only depends on system libs:
$ ldd libcuml++.so
linux-vdso.so.1 (0x00007ffdf5fe2000)
libgomp.so.1 => /home/phcho/mambaforge/envs/cuml_dev/lib/libgomp.so.1 (0x0000711c76529000)
libstdc++.so.6 => /home/phcho/mambaforge/envs/cuml_dev/lib/libstdc++.so.6 (0x0000711c76346000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x0000711c76249000)
libgcc_s.so.1 => /home/phcho/mambaforge/envs/cuml_dev/lib/libgcc_s.so.1 (0x0000711c7622a000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x0000711c75e00000)
/lib64/ld-linux-x86-64.so.2 (0x0000711c765f4000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x0000711c76223000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x0000711c7
What is your question?
I want to use FIL for CPU only inference task, the reason I don't use GPU is that my streaming batch size is relatively small, so the data transfer time between CPU and GPU is unbearable for my latency-critical application.
The first option I tried was to directly create a conda environment with rapidsai-nightly 24.08. But it seems the
libcuml++
is not compiled with flagCUML_ENABLE_GPU=OFF
, it will raise error if I set theCUDA_VISIBLE_DEVICE=''
. Code:Output:
So I try to build cuml from source with
CUML_ENABLE_GPU=OFF
, but this time it failed because the import path actually go through CUDA related code while these part of code is not compiled by cuml (I Guess?).The Python scripts is just a convenient way to check whether the package is functional. I am actually using the c_api for the
cuml::fil
.And I execute my project with
LD_PRELOAD
, as I can only buildlibcuml++.so
from the source, thelibcuml.so
andlibcumlprims_mg.so
are from original rapidsai conda environment, and I will face problem asSo my questions are:
raft_proto::device_type::cpu
is each function call? Or other thing like some config or environment variables are needed?Thanks for your help, Kaiyang