triton-inference-server / fil_backend

FIL backend for the Triton Inference Server
Apache License 2.0
68 stars 35 forks source link

LLVM ERROR: out of memory #368

Open sandeepb2013 opened 11 months ago

sandeepb2013 commented 11 months ago

root@2ff024ed2346:/opt/tritonserver/tmp/simple-xgboost# python3 sample.py Test Accuracy: 51.24 /usr/local/lib/python3.10/dist-packages/xgboost/core.py:160: UserWarning: [09:16:55] WARNING: /workspace/src/c_api/c_api.cc:1240: Saving into deprecated binary model format, please consider using json or ubj. Model format will default to JSON in XGBoost 2.2 if not specified. warnings.warn(smsg, UserWarning) root@2ff024ed2346:/opt/tritonserver/tmp/simple-xgboost# WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35 I1030 09:17:00.890915 1358 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch I1030 09:17:00.892801 1358 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15 I1030 09:17:00.893583 1358 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15 W1030 09:17:00.895411 1358 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I1030 09:17:00.896514 1358 cuda_memory_manager.cc:117] CUDA memory pool disabled I1030 09:17:00.933129 1358 model_lifecycle.cc:462] loading: fil:1 I1030 09:17:00.947223 1358 initialize.hpp:43] TRITONBACKEND_Initialize: fil I1030 09:17:00.948097 1358 backend.hpp:47] Triton TRITONBACKEND API version: 1.15 I1030 09:17:00.948809 1358 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15 I1030 09:17:00.950459 1358 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1) I1030 09:17:00.988559 1358 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0) LLVM ERROR: out of memory

wphicks commented 11 months ago

Thank you for the report! Could you post how the model was generated and the model config file you used to load it into Triton?

wphicks commented 11 months ago

Possibly related: https://github.com/dmlc/treelite/issues/364. If that is indeed the underlying issue, the use_experimental_optimizations flag may be a workaround for the moment.

sandeepb2013 commented 11 months ago

Hi @wphicks, thanks for your quick response. sorry for the late reply...

For model generation and saving.


Import required libraries

import numpy from numpy import loadtxt from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

import os import signal import subprocess

Generate dummy data to perform binary classification

seed = 7 features = 9 # number of sample features samples = 10000 # number of samples X = numpy.random.rand(samples, features).astype('float32') Y = numpy.random.randint(2, size=samples)

test_size = 0.33 X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

model = XGBClassifier() model.fit(X_train, y_train)

y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print("Test Accuracy: {:.2f}".format(accuracy * 100.0))

Create directory to save the model

Save your xgboost model as xgboost.model

For more information on saving xgboost model check https://xgboost.readthedocs.io/en/latest/python/python_intro.html#training

Model can also be dumped to json format

model.save_model('/opt/tritonserver/notebooks/simple-xgboost/model_repository/fil/1/xgboost.model')

triton_process = subprocess.Popen(["tritonserver", "--model-repository=/opt/tritonserver/notebooks/simple-xgboost/model_repository"], stdout=subprocess.PIPE, preexec_fn=os.setsid)

--------config-------

name: "fil" # Name of the model directory (fil in our case) backend: "fil" # Triton FIL backend for deploying forest models max_batch_size: 8192 input [ { name: "input0" data_type: TYPE_FP32 dims: [ 9 ] # Input feature dimensions, in our sample case it's 9 } ] output [ { name: "output0" data_type: TYPE_FP32 dims: [ 1 ] # Output 2 for binary classification model } ] instance_group [{ kind: KIND_CPU }] parameters [ { key: "model_type" value: { string_value: "xgboost" } }, { key: "predict_proba" value: { string_value: "false" } }, { key: "output_class" value: { string_value: "true" } }, { key: "threshold" value: { string_value: "0.5" } }, { key: "algo" value: { string_value: "ALGO_AUTO" } }, { key: "storage_type" value: { string_value: "AUTO" } }, { key: "blocks_per_sm" value: { string_value: "0" } } ]

sandeepb2013 commented 11 months ago

For building the docker image :

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/customization_guide/build.html

wphicks commented 11 months ago

Hmmm... I don't see why that particular model would trigger that Treelite issue, so we may need to dig deeper. Can you try the use_experimental_optimizations flag and let me know if you can successfully run the model with that flag?

wphicks commented 11 months ago

Apologies; I was too hasty when I was thinking about this before. As soon as I saw LLVM, I was thinking about Treelite compiled models, but the FIL backend does not and has never invoked Treelite compiled models. CPU execution is performed through GTIL or our internal optimized CPU implementation.

Can you give us a little more detail on exactly how you got this error? Are there any more details available on the workflow? LLVM should not be involved with Triton at all at the deployment stage.

sandeepb2013 commented 11 months ago

Hi @wphicks,

  Using Build script (https://github.com/triton-inference-server/fil_backend/blob/main/docs/build.md) able to built 2 docker images 
  ----------------------------------------
  REPOSITORY            TAG         IMAGE ID      CREATED      SIZE

localhost/triton_fil latest 8fdf060142f9 3 weeks ago 12.4 GB

after running the docker image able to access the environment but unable to access Jupyter notebook, so created python script

----------------- sample.py-------------------- import numpy from numpy import loadtxt from xgboost import XGBClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score

import os import signal import subprocess

Generate dummy data to perform binary classification

seed = 7 features = 9 # number of sample features samples = 10000 # number of samples X = numpy.random.rand(samples, features).astype('float32') Y = numpy.random.randint(2, size=samples)

test_size = 0.33 X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

model = XGBClassifier() model.fit(X_train, y_train)

y_pred = model.predict(X_test) accuracy = accuracy_score(y_test, y_pred) print("Test Accuracy: {:.2f}".format(accuracy * 100.0))

Create directory to save the model

Save your xgboost model as xgboost.model

For more information on saving xgboost model check https://xgboost.readthedocs.io/en/latest/python/python_intro.html#training

Model can also be dumped to json format

model.save_model('/opt/tritonserver/notebooks/simple-xgboost/model_repository/fil/1/xgboost.model')

triton_process = subprocess.Popen(["tritonserver", "--model-repository=/opt/tritonserver/notebooks/simple-xgboost/model_repository"], stdout=subprocess.PIPE, preexec_fn=os.setsid) -------------------------config.pbtxt---------------

name: "fil" # Name of the model directory (fil in our case) backend: "fil" # Triton FIL backend for deploying forest models max_batch_size: 8192 input [ { name: "input0" data_type: TYPE_FP32 dims: [ 9 ] # Input feature dimensions, in our sample case it's 9 } ] output [ { name: "output0" data_type: TYPE_FP32 dims: [ 1 ] # Output 2 for binary classification model } ] instance_group [{ kind: KIND_CPU }] parameters [ { key: "model_type" value: { string_value: "xgboost" } }, { key: "predict_proba" value: { string_value: "false" } }, { key: "output_class" value: { string_value: "true" } }, { key: "threshold" value: { string_value: "0.5" } }, { key: "algo" value: { string_value: "ALGO_AUTO" } }, { key: "storage_type" value: { string_value: "AUTO" } }, { key: "blocks_per_sm" value: { string_value: "0" } } ]


Finally while runing the sample .py "LLVM" is appearing

cross verified the model and config.pbtxt and structure of the model repo .....

sandeepb2013 commented 11 months ago
Screenshot 2023-11-07 at 2 11 34 PM
sandeepb2013 commented 10 months ago

Hi @wphicks , any further pointers would really help. thanks in advance..

sandeepb2013 commented 10 months ago

when i looked into further other backend(pytorch) could be the reason for LLVM issue. However i'm more interested trying out the FI backend and i kept only FIL backend in the triton backend directory, and facing the below error.

I1121 10:41:01.087972 1 model_lifecycle.cc:462] loading: fil:1 I1121 10:41:01.088345 1 backend_model.cc:364] Adding default backend config setting: default-max-batch-size,4 I1121 10:41:01.088435 1 shared_library.cc:112] OpenLibraryHandle: /opt/tritonserver/backends/fil/libtriton_fil.so I1121 10:41:01.092161 1 initialize.hpp:43] TRITONBACKEND_Initialize: fil I1121 10:41:01.092195 1 backend.hpp:47] Triton TRITONBACKEND API version: 1.15 I1121 10:41:01.092203 1 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15 I1121 10:41:01.092240 1 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1) I1121 10:41:01.093017 1 model_config_utils.cc:1872] ModelConfig 64-bit fields: I1121 10:41:01.093053 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::default_priority_level I1121 10:41:01.093061 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds I1121 10:41:01.093068 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::max_queue_delay_microseconds I1121 10:41:01.093074 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_levels I1121 10:41:01.093081 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_queue_policy::key I1121 10:41:01.093088 1 model_config_utils.cc:1874] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds I1121 10:41:01.093095 1 model_config_utils.cc:1874] ModelConfig::ensemble_scheduling::step::model_version I1121 10:41:01.093102 1 model_config_utils.cc:1874] ModelConfig::input::dims I1121 10:41:01.093110 1 model_config_utils.cc:1874] ModelConfig::input::reshape::shape I1121 10:41:01.093117 1 model_config_utils.cc:1874] ModelConfig::instance_group::secondary_devices::device_id I1121 10:41:01.093123 1 model_config_utils.cc:1874] ModelConfig::model_warmup::inputs::value::dims I1121 10:41:01.093130 1 model_config_utils.cc:1874] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim I1121 10:41:01.093138 1 model_config_utils.cc:1874] ModelConfig::optimization::cuda::graph_spec::input::value::dim I1121 10:41:01.093145 1 model_config_utils.cc:1874] ModelConfig::output::dims I1121 10:41:01.093152 1 model_config_utils.cc:1874] ModelConfig::output::reshape::shape I1121 10:41:01.093159 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds I1121 10:41:01.093166 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::max_sequence_idle_microseconds I1121 10:41:01.093173 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds I1121 10:41:01.093234 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::state::dims I1121 10:41:01.093244 1 model_config_utils.cc:1874] ModelConfig::sequence_batching::state::initial_state::dims I1121 10:41:01.093252 1 model_config_utils.cc:1874] ModelConfig::version_policy::specific::versions I1121 10:41:01.094102 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0) I1121 10:41:01.094137 1 backend_model_instance.cc:69] Creating instance fil_0_0 on CPU using artifact '' terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc

sandeepb2013 commented 10 months ago

Do we have any specific minimal memory requirement for FIl backend to start?.. Thanks

wphicks commented 10 months ago

@sandeepb2013 Could you try with an officially-released Triton Docker image and enable use_experimental_optimizations in your config.pbtxt? The memory requirements should be quite modest, though they'll depend on the details of the model. If you still run into issues, can you see how far you get running either the fraud detection or FAQ notebook before a cell fails?

sandeepb2013 commented 10 months ago

root@lees1:~/work/fil_backend# docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /root/work/fil_backend/models:/models --name tritonserver nvcr.io/nvidia/tritonserver:23.08-py3 tritonserver --model-repository=/models """

== Triton Inference Server ==

NVIDIA Release 23.08 (build 66820947) Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use the NVIDIA Container Toolkit to start this container with GPU support; see https://docs.nvidia.com/datacenter/cloud-native/ .

WARNING: [Torch-TensorRT] - Unable to read CUDA capable devices. Return status: 35 I1128 08:57:49.478413 1 libtorch.cc:2507] TRITONBACKEND_Initialize: pytorch I1128 08:57:49.478480 1 libtorch.cc:2517] Triton TRITONBACKEND API version: 1.15 I1128 08:57:49.478494 1 libtorch.cc:2523] 'pytorch' TRITONBACKEND API version: 1.15 W1128 08:57:49.478588 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I1128 08:57:49.478721 1 cuda_memory_manager.cc:117] CUDA memory pool disabled I1128 08:57:49.481254 1 model_lifecycle.cc:462] loading: fil:1 I1128 08:57:49.490362 1 initialize.hpp:43] TRITONBACKEND_Initialize: fil I1128 08:57:49.490404 1 backend.hpp:47] Triton TRITONBACKEND API version: 1.15 I1128 08:57:49.490413 1 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15 I1128 08:57:49.490465 1 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1) I1128 08:57:49.492124 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0) LLVM ERROR: out of memory [4f6fbf8992de:1 :0:54] Caught signal 11 (Segmentation fault: Sent by the kernel at address (nil)) ==== backtrace (tid: 54) ==== 0 0x0000000000042520 sigaction() ???:0 1 0x0000000000028898 abort() ???:0 2 0x000000000261cdab getInferLibVersion() ???:0 3 0x00000000000ae9a3 operator new() ???:0 4 0x0000000000219479 std::vector<char, std::allocator >::_M_default_append() ???:0 5 0x0000000000212e3a (anonymous namespace)::XGBTree::Load() xgboost.cc:0 6 0x0000000000213fe4 (anonymous namespace)::ParseStream() xgboost.cc:0 7 0x0000000000215d5a treelite::frontend::LoadXGBoostModel() ???:0 8 0x00000000001656a5 triton::backend::fil::load_tl_base_model() ???:0 9 0x00000000001deaad triton::backend::fil::RapidsModel::load() ???:0 10 0x00000000001e0860 triton::backend::rapids::triton_api::instance_initialize<triton::backend::rapids::TritonModelState, triton::backend::rapids::ModelInstanceState<triton::backend::fil::RapidsModel, triton::backend::fil::RapidsSharedState> >() ???:0 11 0x00000000001a0116 triton::core::TritonModelInstance::ConstructAndInitializeInstance() :0 12 0x00000000001a1356 triton::core::TritonModelInstance::CreateInstance() :0 13 0x0000000000185bd5 triton::core::TritonModel::PrepareInstances(inference::ModelConfig const&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >)::{lambda()#1}::operator()() backend_model.cc:0 14 0x0000000000186216 std::_Function_handler<std::unique_ptr<std::future_base::_Result_base, std::future_base::_Result_base::_Deleter> (), std::future_base::_Task_setter<std::unique_ptr<std::future_base::_Result, std::future_base::_Result_base::_Deleter>, std::thread::_Invoker<std::tuple<triton::core::TritonModel::PrepareInstances(inference::ModelConfig const&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >)::{lambda()#1}> >, triton::core::Status> >::_M_invoke() backend_model.cc:0 15 0x000000000019131d std::__future_base::_State_baseV2::_M_do_set() :0 16 0x0000000000099f68 pthread_mutexattr_setkind_np() ???:0 17 0x000000000017dadb std::future_base::_Deferred_state<std::thread::_Invoker<std::tuple<triton::core::TritonModel::PrepareInstances(inference::ModelConfig const&, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > >)::{lambda()#1}> >, triton::core::Status>::_M_complete_async() backend_model.cc:0 18 0x000000000018b865 triton::core::TritonModel::PrepareInstances() :0 19 0x0000000000190682 triton::core::TritonModel::Create() :0 20 0x0000000000273230 triton::core::ModelLifeCycle::CreateModel() :0 21 0x0000000000276923 std::_Function_handler<void (), triton::core::ModelLifeCycle::AsyncLoad(triton::core::ModelIdentifier const&, std::cxx11::basic_string<char, std::char_traits, std::allocator > const&, inference::ModelConfig const&, bool, bool, std::shared_ptr const&, std::function<void (triton::core::Status)>&&)::{lambda()#2}>::_M_invoke() model_lifecycle.cc:0 22 0x00000000003bfe52 std::thread::_State_impl<std::thread::_Invoker<std::tuple<triton::common::ThreadPool::ThreadPool(unsigned long)::{lambda()#1}> > >::_M_run() thread_pool.cc:0 23 0x00000000000dc253 std::error_code::default_error_condition() ???:0 24 0x0000000000094b43 pthread_condattr_setpshared() ???:0 25 0x0000000000125bb4 clone() ???:0

"""

sandeepb2013 commented 10 months ago

=========config.pbtxt============ name: "fil" # Name of the model directory (fil in our case) backend: "fil" # Triton FIL backend for deploying forest models max_batch_size: 8192 input [ { name: "input0" data_type: TYPE_FP32 dims: [ 9 ] # Input feature dimensions, in our sample case it's 9 } ] output [ { name: "output0" data_type: TYPE_FP32 dims: [ 1 ] # Output 2 for binary classification model } ] instance_group [{ kind: KIND_AUTO }] parameters [ { key: "model_type" value: { string_value: "xgboost" } }, { key: "predict_proba" value: { string_value: "true" } }, { key: "output_class" value: { string_value: "true" } }, { key: "threshold" value: { string_value: "0.5" } }, { key: "algo" value: { string_value: "ALGO_AUTO" } }, { key: "storage_type" value: { string_value: "AUTO" } }, { key: "blocks_per_sm" value: { string_value: "0" } }, { key: "use_experimental_optimizations" value: { string_value: "true" } } ]

sandeepb2013 commented 10 months ago

root@lees1:~/work/fil_backend# docker run --rm --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -p 8000:8000 -p 8001:8001 -p 8002:8002 -v /root/work/fil_backend/models:/models --name tritonserver fil_23 tritonserver --model-repository=/models

============================= == Triton Inference Server ==

NVIDIA Release 23.08 (build 66820947) Triton Server Version 2.37.0

Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

WARNING: The NVIDIA Driver was not detected. GPU functionality will not be available. Use the NVIDIA Container Toolkit to start this container with GPU support; see https://docs.nvidia.com/datacenter/cloud-native/ .

W1129 06:31:14.927633 1 pinned_memory_manager.cc:237] Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version I1129 06:31:14.927720 1 cuda_memory_manager.cc:117] CUDA memory pool disabled I1129 06:31:14.929950 1 model_lifecycle.cc:462] loading: fil:1 I1129 06:31:14.938431 1 initialize.hpp:43] TRITONBACKEND_Initialize: fil I1129 06:31:14.938468 1 backend.hpp:47] Triton TRITONBACKEND API version: 1.15 I1129 06:31:14.938477 1 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.15 I1129 06:31:14.938513 1 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: fil (version 1) I1129 06:31:14.940011 1 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: fil_0_0 (CPU device 0) terminate called after throwing an instance of 'std::bad_alloc' what(): std::bad_alloc