Open sandeepb2013 opened 11 months ago
Thank you for the report! Could you post how the model was generated and the model config file you used to load it into Triton?
Possibly related: https://github.com/dmlc/treelite/issues/364. If that is indeed the underlying issue, the use_experimental_optimizations
flag may be a workaround for the moment.
import numpy from numpy import loadtxt from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score
import os import signal import subprocess
seed = 7 features = 9 # number of sample features samples = 10000 # number of samples X = numpy.random.rand(samples, features).astype('float32') Y = numpy.random.randint(2, size=samples)
test_size = 0.33 X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
model = XGBClassifier() model.fit(X_train, y_train)
y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print("Test Accuracy: {:.2f}".format(accuracy * 100.0))
model.save_model('/opt/tritonserver/notebooks/simple-xgboost/model_repository/fil/1/xgboost.model')
triton_process = subprocess.Popen(["tritonserver", "--model-repository=/opt/tritonserver/notebooks/simple-xgboost/model_repository"], stdout=subprocess.PIPE, preexec_fn=os.setsid)
name: "fil" # Name of the model directory (fil in our case) backend: "fil" # Triton FIL backend for deploying forest models max_batch_size: 8192 input [ { name: "input0" data_type: TYPE_FP32 dims: [ 9 ] # Input feature dimensions, in our sample case it's 9 } ] output [ { name: "output0" data_type: TYPE_FP32 dims: [ 1 ] # Output 2 for binary classification model } ] instance_group [{ kind: KIND_CPU }] parameters [ { key: "model_type" value: { string_value: "xgboost" } }, { key: "predict_proba" value: { string_value: "false" } }, { key: "output_class" value: { string_value: "true" } }, { key: "threshold" value: { string_value: "0.5" } }, { key: "algo" value: { string_value: "ALGO_AUTO" } }, { key: "storage_type" value: { string_value: "AUTO" } }, { key: "blocks_per_sm" value: { string_value: "0" } } ]
For building the docker image :
Hmmm... I don't see why that particular model would trigger that Treelite issue, so we may need to dig deeper. Can you try the use_experimental_optimizations
flag and let me know if you can successfully run the model with that flag?
Apologies; I was too hasty when I was thinking about this before. As soon as I saw LLVM
, I was thinking about Treelite compiled models, but the FIL backend does not and has never invoked Treelite compiled models. CPU execution is performed through GTIL or our internal optimized CPU implementation.
Can you give us a little more detail on exactly how you got this error? Are there any more details available on the workflow? LLVM should not be involved with Triton at all at the deployment stage.
Hi @wphicks,
Using Build script (https://github.com/triton-inference-server/fil_backend/blob/main/docs/build.md) able to built 2 docker images
----------------------------------------
REPOSITORY TAG IMAGE ID CREATED SIZE
after running the docker image able to access the environment but unable to access Jupyter notebook, so created python script
----------------- sample.py-------------------- import numpy from numpy import loadtxt from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score
import os import signal import subprocess
seed = 7 features = 9 # number of sample features samples = 10000 # number of samples X = numpy.random.rand(samples, features).astype('float32') Y = numpy.random.randint(2, size=samples)
test_size = 0.33 X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
model = XGBClassifier() model.fit(X_train, y_train)
y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print("Test Accuracy: {:.2f}".format(accuracy * 100.0))
model.save_model('/opt/tritonserver/notebooks/simple-xgboost/model_repository/fil/1/xgboost.model')
triton_process = subprocess.Popen(["tritonserver", "--model-repository=/opt/tritonserver/notebooks/simple-xgboost/model_repository"], stdout=subprocess.PIPE, preexec_fn=os.setsid) -------------------------config.pbtxt---------------
name: "fil" # Name of the model directory (fil in our case) backend: "fil" # Triton FIL backend for deploying forest models max_batch_size: 8192 input [ { name: "input0" data_type: TYPE_FP32 dims: [ 9 ] # Input feature dimensions, in our sample case it's 9 } ] output [ { name: "output0" data_type: TYPE_FP32 dims: [ 1 ] # Output 2 for binary classification model } ] instance_group [{ kind: KIND_CPU }] parameters [ { key: "model_type" value: { string_value: "xgboost" } }, { key: "predict_proba" value: { string_value: "false" } }, { key: "output_class" value: { string_value: "true" } }, { key: "threshold" value: { string_value: "0.5" } }, { key: "algo" value: { string_value: "ALGO_AUTO" } }, { key: "storage_type" value: { string_value: "AUTO" } }, { key: "blocks_per_sm" value: { string_value: "0" } } ]
Finally while runing the sample .py "LLVM" is appearing
cross verified the model and config.pbtxt and structure of the model repo .....
Hi @wphicks , any further pointers would really help. thanks in advance..
when i looked into further other backend(pytorch) could be the reason for LLVM issue. However i'm more interested trying out the FI backend and i kept only FIL backend in the triton backend directory, and facing the below error.
I1121 10:41:01.087972 1 model_lifecycle.cc:462] loading: fil:1 I1121 10:41:01.088345 1 backend_model.cc:364] Adding default backend config setting: default-max-batch-size,4 I1121 10:41:01.088435 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/fil/libtriton_fil.so I1121 10:41:01.092161 1 initialize.hpp:43] TRITONBACKEND_Initialize: fil I1121 10:41:01.092195 1 backend.hpp:47] Triton TRITONBACKEND API version: 1.15 I1121 10:41:01.092203 1 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15 I1121 10:41:01.092240 1 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1) I1121 10:41:01.093017 1 model_config_utils.cc:1872] ModelConfig 64-bit fields: I1121 10:41:01.093053 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::default_priority_level I1121 10:41:01.093061 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds I1121 10:41:01.093068 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::max_queue_delay_microseconds I1121 10:41:01.093074 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_levels I1121 10:41:01.093081 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_queue_policy::key I1121 10:41:01.093088 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds I1121 10:41:01.093095 1 model_config_utils.cc:1874] ModelConfig::ensemble_scheduling::step::model_version I1121 10:41:01.093102 1 model_config_utils.cc:1874] ModelConfig::input::dims I1121 10:41:01.093110 1 model_config_utils.cc:1874] ModelConfig::input::reshape::shape I1121 10:41:01.093117 1 model_config_utils.cc:1874] ModelConfig::instance_group::secondary_devices::device_id I1121 10:41:01.093123 1 model_config_utils.cc:1874] ModelConfig::model_warmup::inputs::value::dims I1121 10:41:01.093130 1 model_config_utils.cc:1874] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim I1121 10:41:01.093138 1 model_config_utils.cc:1874] ModelConfig::optimization::cuda::graph_spec::input::value::dim I1121 10:41:01.093145 1 model_config_utils.cc:1874] ModelConfig::output::dims I1121 10:41:01.093152 1 model_config_utils.cc:1874] ModelConfig::output::reshape::shape I1121 10:41:01.093159 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds I1121 10:41:01.093166 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::max_sequence_idle_microseconds I1121 10:41:01.093173 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds I1121 10:41:01.093234 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::state::dims I1121 10:41:01.093244 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::state::initial_state::dims I1121 10:41:01.093252 1 model_config_utils.cc:1874] ModelConfig::version_policy::specific::versions I1121 10:41:01.094102 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0) I1121 10:41:01.094137 1 backend_model_instance.cc:69] Creating instance fil_0_0 on CPU using artifact '' terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
Do we have any specific minimal memory requirement for FIl backend to start?.. Thanks
@sandeepb2013 Could you try with an officially-released Triton Docker image and enable use_experimental_optimizations
in your config.pbtxt? The memory requirements should be quite modest, though they'll depend on the details of the model. If you still run into issues, can you see how far you get running either the fraud detection or FAQ notebook before a cell fails?
NVIDIA Release 23.08 (build 66820947) Triton Server Version 2.37.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use the NVIDIA Container Toolkit to start this container with GPU support; see https://docs.nvidia.com/datacenter/cloud-native/ .
"""
=========config.pbtxt============ name: "fil" # Name of the model directory (fil in our case) backend: "fil" # Triton FIL backend for deploying forest models max_batch_size: 8192 input [ { name: "input0" data_type: TYPE_FP32 dims: [ 9 ] # Input feature dimensions, in our sample case it's 9 } ] output [ { name: "output0" data_type: TYPE_FP32 dims: [ 1 ] # Output 2 for binary classification model } ] instance_group [{ kind: KIND_AUTO }] parameters [ { key: "model_type" value: { string_value: "xgboost" } }, { key: "predict_proba" value: { string_value: "true" } }, { key: "output_class" value: { string_value: "true" } }, { key: "threshold" value: { string_value: "0.5" } }, { key: "algo" value: { string_value: "ALGO_AUTO" } }, { key: "storage_type" value: { string_value: "AUTO" } }, { key: "blocks_per_sm" value: { string_value: "0" } }, { key: "use_experimental_optimizations" value: { string_value: "true" } } ]
root@lees1:~/work/fil_backend# docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /root/work/fil_backend/models:/models --name tritonserver fil_23 tritonserver --model-repository=/models
NVIDIA Release 23.08 (build 66820947) Triton Server Version 2.37.0
Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use the NVIDIA Container Toolkit to start this container with GPU support; see https://docs.nvidia.com/datacenter/cloud-native/ .
W1129 06:31:14.927633 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I1129 06:31:14.927720 1 cuda_memory_manager.cc:117] CUDA memory pool disabled I1129 06:31:14.929950 1 model_lifecycle.cc:462] loading: fil:1 I1129 06:31:14.938431 1 initialize.hpp:43] TRITONBACKEND_Initialize: fil I1129 06:31:14.938468 1 backend.hpp:47] Triton TRITONBACKEND API version: 1.15 I1129 06:31:14.938477 1 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15 I1129 06:31:14.938513 1 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1) I1129 06:31:14.940011 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0) terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc
root@2ff024ed2346:/opt/tritonserver/tmp/simple-xgboost# python3 sample.py Test Accuracy: 51.24 /usr/local/lib/python3.10/dist-packages/xgboost/core.py:160: UserWarning: [09:16:55] WARNING: /workspace/src/c_api/c_api.cc:1240: Saving into deprecated binary model format, please consider using
json
orubj
. Model format will default to JSON in XGBoost 2.2 if not specified. warnings.warn(smsg, UserWarning) root@2ff024ed2346:/opt/tritonserver/tmp/simple-xgboost# WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35 I1030 09:17:00.890915 1358 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch I1030 09:17:00.892801 1358 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15 I1030 09:17:00.893583 1358 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15 W1030 09:17:00.895411 1358 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I1030 09:17:00.896514 1358 cuda_memory_manager.cc:117] CUDA memory pool disabled I1030 09:17:00.933129 1358 model_lifecycle.cc:462] loading: fil:1 I1030 09:17:00.947223 1358 initialize.hpp:43] TRITONBACKEND_Initialize: fil I1030 09:17:00.948097 1358 backend.hpp:47] Triton TRITONBACKEND API version: 1.15 I1030 09:17:00.948809 1358 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15 I1030 09:17:00.950459 1358 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1) I1030 09:17:00.988559 1358 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0) LLVM ERROR: out of memory