triton-inference-server / fil_backend

FIL backend for the Triton Inference Server
Apache License 2.0
68 stars 35 forks source link

Unable to run fil backend on local docker #325

Closed andompesta closed 1 year ago

andompesta commented 1 year ago

I'm trying to run a xgboost model on triton-server from my local mac m1. Instead of docker I'm using podman, but running the command

docker run -p 8000:8000 -p 8001:8001 -p 8002:8002 -v .../model_repository:/models --name tritonserver nvcr.io/nvidia/tritonserver:22.05-py3 tritonserver --model-repository=/models

return the error:

W0109 20:52:58.855072 1 pinned_memory_manager.cc:236] Unable to allocate pinned system memory, pinned memory pool will not be available: no CUDA-capable device is detected
I0109 20:52:58.855582 1 cuda_memory_manager.cc:115] CUDA memory pool disabled
I0109 20:52:58.891440 1 model_repository_manager.cc:1191] loading: XXXX:1
E0109 20:52:59.001918 1 model_repository_manager.cc:1348] failed to load 'XXXX' version 1: Invalid argument: unable to find 'libtriton_fil.so' for model 'XXXX, searched: /models/XXXX/1, /models/XXXX, /opt/tritonserver/backends/fil
I0109 20:52:59.002301 1 server.cc:556] 
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0109 20:52:59.002970 1 server.cc:583] 
+---------+------+--------+
| Backend | Path | Config |
+---------+------+--------+
+---------+------+--------+

I0109 20:52:59.003417 1 server.cc:626] 
+-------------------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Model                         | Version | Status                                                                                                                                                                                                                               |
+-------------------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| view-to-click-xgboost-catalog | 1       | UNAVAILABLE: Invalid argument: unable to find 'libtriton_fil.so' for model 'XXXX', searched: /models/XXXX/1, /models/XXXX, /opt/tritonserver/backends/fil |
+-------------------------------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0109 20:52:59.003679 1 tritonserver.cc:2138] 
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                                                                                        |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                                                                                       |
| server_version                   | 2.22.0                                                                                                                                                                                       |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |
| model_repository_path[0]         | /models                                                                                                                                                                                      |
| model_control_mode               | MODE_NONE                                                                                                                                                                                    |
| strict_model_config              | 1                                                                                                                                                                                            |
| rate_limit                       | OFF                                                                                                                                                                                          |
| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |
| response_cache_byte_size         | 0                                                                                                                                                                                            |
| min_supported_compute_capability | 6.0                                                                                                                                                                                          |
| strict_readiness                 | 1                                                                                                                                                                                            |
| exit_timeout                     | 30                                                                                                                                                                                           |
+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I0109 20:52:59.003867 1 server.cc:257] Waiting for in-flight requests to complete.
I0109 20:52:59.003940 1 server.cc:273] Timeout 30: Found 0 model versions that have in-flight inferences
I0109 20:52:59.004050 1 server.cc:288] All models are stopped, unloading models
I0109 20:52:59.004137 1 server.cc:295] Timeout 30: Found 0 live models and 0 in-flight non-inference requests
error: creating server: Internal - failed to load all models

config.pbtxt looks like

name: "XXXX"
backend: "fil"
max_batch_size: 500

instance_group [
  {
    count: 1
    kind: KIND_CPU
  }
]

input [
  {
    name: "input__0"
    data_type: TYPE_FP32
    dims: [ 293 ]
  }
]

output [
  {
    name: "output__0"
    data_type: TYPE_FP32
    dims: [ 2 ]
  }
]

parameters [
  {
    key: "model_type"
    value: { string_value: "xgboost_json" }
  },
  {
    key: "output_class"
    value: { string_value: "false" }
  },
  {
    key: "predict_proba"
    value: { string_value: "true" }
  }
]

dynamic_batching {}
wphicks commented 1 year ago

Unfortunately, the FIL backend has not yet been released for ARM. There is a very intermittent bug which causes incorrect results to be occasionally returned on ARM machines. We're still tracking it down, but we don't want anything that could possibly alter the results returned from a model to make it to a release version.

If this is just for testing purposes, you should be able to build using these instructions (possibly with a slight modification of the script for Mac). I would not recommend using an ARM build in production, however, until we provide an official ARM release.

I'm going to close this for now since the underlying issue is really that we just need to release for ARM, but please feel free to either re-open or ping me in this thread for further help with workarounds until that release happens.