triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Apache License 2.0
425 stars 74 forks source link

Model-analyser, on remote host and docker. #70

Closed alphapibeta closed 3 years ago

alphapibeta commented 3 years ago

Can you please, help me with this issue?

I have generated the models that works with triton-20.09 in a stand alone Triton-inference-server container. I have built the models-analyser and it by default supports triton-inference 20.11. While, I am passing models and plugins that are generated in 20.09 it is giving me an error when loading with model-analyser since, model-analyser supports 20.11. On the other side, when I am generating models and plugins with trt ngc container-20.11 and loading in 20.11 model-analyser I am able to run the model-analyser without any issue. My requirement is to load the models and plugins in model-analyser that are generated for 20.09.

Running

sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v /home/ubuntu/cuda/sec_models:/models -v /home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash

The models and plugins that are given in the above command, are generated for 20.09-py3. The models are loaded fine with 20.09-py3 triton inference server.

Command inside the docker.

model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode docker --triton-version 20.09-py3

Error

model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode docker --triton-version 20.09-py3
2021-01-23 19:39:10.854 INFO[entrypoint.py:368] Triton Model Analyzer started: config={'model_repository': '/models/', 'model_names': 'yolo1', 'batch_sizes': '1', 'concurrency': '1', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'docker', 'triton_version': '20.09-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-23 19:39:10.859 INFO[entrypoint.py:105] Starting a Triton Server using docker...
2021-01-23 19:39:10.859 INFO[driver.py:236] init
2021-01-23 19:39:13.687 INFO[entrypoint.py:209] Triton Server is ready.
2021-01-23 19:39:14.714 INFO[entrypoint.py:383] Starting perf_analyzer...
2021-01-23 19:39:14.714 INFO[analyzer.py:91] Profiling server only metrics...
2021-01-23 19:39:15.737 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-5df6aea1-a690-25ee-c16e-bd46a1d95792 } for the analysis.
2021-01-23 19:39:21.852 ERROR[entrypoint.py:387] Model Analyzer encountered an error: Unable to load the model : [StatusCode.INTERNAL] failed to load 'yolo1', no version is available
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 79, in load_model
    self._client.load_model(model.name())
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 555, in load_model
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] failed to load 'yolo1', no version is available

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 384, in main
    run_analyzer(config, analyzer, client, run_configs)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 323, in run_analyzer
    client.load_model(model=model)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 82, in load_model
    f"Unable to load the model : {e}")
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.INTERNAL] failed to load 'yolo1', no version is available
2021-01-23 19:39:21.854 INFO[server_docker.py:128] Stopping triton server.

Also how do we Run docker on remote mode

Stand-alone-inference server-20.09-py3

sudo docker run --gpus all --rm --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v/home/ubuntu/cuda/sec_models:/models -v/home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" bdb0cbe1c039 tritonserver --model-repository=/models --grpc-infer-allocation-pool-size=512 --log-verbose 1

op

I0123 19:44:29.564053 1 grpc_server.cc:2078] Thread started for ModelStreamInferHandler
I0123 19:44:29.564070 1 grpc_server.cc:3897] Started GRPCInferenceService at 0.0.0.0:8001
I0123 19:44:29.564351 1 http_server.cc:2705] Started HTTPService at 0.0.0.0:8000
I0123 19:44:29.605837 1 http_server.cc:2724] Started Metrics Service at 0.0.0.0:8002

Model-analyser command

sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v /home/ubuntu/cuda/sec_models:/models -v /home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash

inside docker

model-analyzer -m /models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode remote --triton-grpc-endpoint localhost:8001
2021-01-23 19:53:10.191 INFO[entrypoint.py:368] Triton Model Analyzer started: config={'model_repository': '/models/', 'model_names': 'yolo1', 'batch_sizes': '1', 'concurrency': '1', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'remote', 'triton_version': '20.11-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None}
2021-01-23 19:53:10.197 INFO[entrypoint.py:84] Using remote Triton Server...
2021-01-23 19:53:10.199 INFO[entrypoint.py:209] Triton Server is ready.
2021-01-23 19:53:10.199 INFO[driver.py:236] init
2021-01-23 19:53:11.299 INFO[entrypoint.py:383] Starting perf_analyzer...
2021-01-23 19:53:11.299 INFO[analyzer.py:91] Profiling server only metrics...
2021-01-23 19:53:12.323 INFO[monitor.py:74] Using GPU(s) with UUID(s) = { GPU-5df6aea1-a690-25ee-c16e-bd46a1d95792 } for the analysis.
2021-01-23 19:53:18.438 ERROR[entrypoint.py:387] Model Analyzer encountered an error: Unable to load the model : [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 79, in load_model
    self._client.load_model(model.name())
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 555, in load_model
    raise_error_grpc(rpc_error)
  File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/__init__.py", line 61, in raise_error_grpc
    raise get_error_grpc(rpc_error) from None
tritonclient.utils.InferenceServerException: [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 384, in main
    run_analyzer(config, analyzer, client, run_configs)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 323, in run_analyzer
    client.load_model(model=model)
  File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 82, in load_model
    f"Unable to load the model : {e}")
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.UNAVAILABLE] explicit model load / unload is not allowed if polling is enabled
root@tensorgo-rppg:/opt/triton-model-analyzer# 
Tabrizian commented 3 years ago

@alphapibeta Regarding your first question, there is a bug in Model Analyzer currently that requires the path inside the container to be the same as the path outside the container.

For now, I recommend loading the model in the same path as the host machine.

sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v /home/ubuntu/cuda/sec_models:/home/ubuntu/cuda/sec_models -v /home/ubuntu/cuda/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash
model-analyzer -m /home/ubuntu/cuda/sec_models/ -n yolo1 --batch-size 1 -c 1 --triton-launch-mode docker --triton-version 20.09-py3

/cc @xprotobeast2

Regarding your second question, you need to start the tritonserver using --model-control-mode=explicit flag when you want to use the remote mode. I'll update the doc to reflect this. Thanks for pointing this out.

naveengogineni commented 3 years ago

@Tabrizian I tried your suggestion ( you need to start the tritonserver using --model-control-mode=explicit flag when you want to use the remote mode.) regarding executing Model Analyzer on remote mode (second question from @xprotobeast2) and below are the results -

Prerequisites: 1) Plugins built for Triton Inference Server - 20.11 2) Model repo path is mapped as is to the docker 3) Started the triton server using --model-control-mode=explicit

Now please find the below commands used and the results (new error) and your response will be highly appreciated.

Starting Triton Server:

ubuntu@xxxxxxx-inference-2:~$ sudo docker run --gpus all --rm --shm-size=1g --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -p8000:8000 -p8001:8001 -p8002:8002 -v/home/ubuntu/sec_models:/home/ubuntu/sec_models -v/home/ubuntu/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" 29df4d808bc0 tritonserver --model-control-mode=explicit --model-repository=/home/ubuntu/sec_models --grpc-infer-allocation-pool-size=16 --log-verbose 1

============================= == Triton Inference Server ==

NVIDIA Release 20.11 (build )

Copyright (c) 2018-2020, NVIDIA CORPORATION. All rights reserved.

Various files include modifications (c) NVIDIA CORPORATION. All rights reserved. NVIDIA modifications are covered by the license terms that apply to the underlying project or file.

I0124 04:32:43.126250 1 metrics.cc:219] Collecting metrics for GPU 0: Tesla P100-SXM2-16GB I0124 04:32:43.383000 1 pinned_memory_manager.cc:199] Pinned memory pool is created at '0x7f985a000000' with size 268435456 I0124 04:32:43.387117 1 cuda_memory_manager.cc:99] CUDA memory pool is created on device 0 with size 67108864 I0124 04:32:43.387839 1 backend_factory.h:44] Create TritonBackendFactory I0124 04:32:43.387885 1 plan_backend_factory.cc:48] Create PlanBackendFactory I0124 04:32:43.387895 1 plan_backend_factory.cc:55] Registering TensorRT Plugins I0124 04:32:43.387945 1 logging.cc:52] Registered plugin creator - ::BatchTilePlugin_TRT version 1 I0124 04:32:43.387962 1 logging.cc:52] Registered plugin creator - ::BatchedNMS_TRT version 1 I0124 04:32:43.387976 1 logging.cc:52] Registered plugin creator - ::BatchedNMSDynamic_TRT version 1 I0124 04:32:43.387996 1 logging.cc:52] Registered plugin creator - ::CoordConvAC version 1 I0124 04:32:43.388007 1 logging.cc:52] Registered plugin creator - ::CropAndResize version 1 I0124 04:32:43.388018 1 logging.cc:52] Registered plugin creator - ::DetectionLayer_TRT version 1 I0124 04:32:43.388029 1 logging.cc:52] Registered plugin creator - ::FlattenConcat_TRT version 1 I0124 04:32:43.388039 1 logging.cc:52] Registered plugin creator - ::GenerateDetection_TRT version 1 I0124 04:32:43.388053 1 logging.cc:52] Registered plugin creator - ::GridAnchor_TRT version 1 I0124 04:32:43.388063 1 logging.cc:52] Registered plugin creator - ::GridAnchorRect_TRT version 1 I0124 04:32:43.388074 1 logging.cc:52] Registered plugin creator - ::InstanceNormalization_TRT version 1 I0124 04:32:43.388084 1 logging.cc:52] Registered plugin creator - ::LReLU_TRT version 1 I0124 04:32:43.388095 1 logging.cc:52] Registered plugin creator - ::MultilevelCropAndResize_TRT version 1 I0124 04:32:43.388106 1 logging.cc:52] Registered plugin creator - ::MultilevelProposeROI_TRT version 1 I0124 04:32:43.388117 1 logging.cc:52] Registered plugin creator - ::NMS_TRT version 1 I0124 04:32:43.388128 1 logging.cc:52] Registered plugin creator - ::Normalize_TRT version 1 I0124 04:32:43.388147 1 logging.cc:52] Registered plugin creator - ::PriorBox_TRT version 1 I0124 04:32:43.388158 1 logging.cc:52] Registered plugin creator - ::ProposalLayer_TRT version 1 I0124 04:32:43.388169 1 logging.cc:52] Registered plugin creator - ::Proposal version 1 I0124 04:32:43.388179 1 logging.cc:52] Registered plugin creator - ::PyramidROIAlign_TRT version 1 I0124 04:32:43.388189 1 logging.cc:52] Registered plugin creator - ::Region_TRT version 1 I0124 04:32:43.388199 1 logging.cc:52] Registered plugin creator - ::Reorg_TRT version 1 I0124 04:32:43.388209 1 logging.cc:52] Registered plugin creator - ::ResizeNearest_TRT version 1 I0124 04:32:43.388223 1 logging.cc:52] Registered plugin creator - ::RPROI_TRT version 1 I0124 04:32:43.388233 1 logging.cc:52] Registered plugin creator - ::SpecialSlice_TRT version 1 I0124 04:32:43.388243 1 logging.cc:52] Registered plugin creator - ::Split version 1 I0124 04:32:43.388261 1 libtorch_backend_factory.cc:53] Create LibTorchBackendFactory I0124 04:32:43.388274 1 custom_backend_factory.cc:46] Create CustomBackendFactory I0124 04:32:43.388286 1 ensemble_backend_factory.cc:47] Create EnsembleBackendFactory I0124 04:32:43.388357 1 server.cc:141] +---------+--------+------+ | Backend | Config | Path | +---------+--------+------+ +---------+--------+------+

I0124 04:32:43.388368 1 model_repository_manager.cc:451] BackendStates() I0124 04:32:43.388392 1 server.cc:184] +-------+---------+--------+ | Model | Version | Status | +-------+---------+--------+ +-------+---------+--------+

I0124 04:32:43.388546 1 tritonserver.cc:1620] +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ | Option | Value | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+ | server_id | triton | | server_version | 2.5.0 | | server_extensions | classification sequence model_repository schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics | | model_repository_path[0] | /home/ubuntu/sec_models | | model_control_mode | MODE_EXPLICIT | | strict_model_config | 1 | | pinned_memory_pool_byte_size | 268435456 | | cuda_memory_pool_byte_size{0} | 67108864 | | min_supported_compute_capability | 6.0 | | strict_readiness | 1 | | exit_timeout | 30 | +----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------+

I0124 04:32:43.389999 1 grpc_server.cc:225] Ready for RPC 'ServerLive', 0 I0124 04:32:43.390035 1 grpc_server.cc:225] Ready for RPC 'ServerReady', 0 I0124 04:32:43.390044 1 grpc_server.cc:225] Ready for RPC 'ModelReady', 0 I0124 04:32:43.390053 1 grpc_server.cc:225] Ready for RPC 'ServerMetadata', 0 I0124 04:32:43.390063 1 grpc_server.cc:225] Ready for RPC 'ModelMetadata', 0 I0124 04:32:43.390073 1 grpc_server.cc:225] Ready for RPC 'ModelConfig', 0 I0124 04:32:43.390080 1 grpc_server.cc:225] Ready for RPC 'ModelStatistics', 0 I0124 04:32:43.390091 1 grpc_server.cc:225] Ready for RPC 'SystemSharedMemoryStatus', 0 I0124 04:32:43.390101 1 grpc_server.cc:225] Ready for RPC 'SystemSharedMemoryRegister', 0 I0124 04:32:43.390110 1 grpc_server.cc:225] Ready for RPC 'SystemSharedMemoryUnregister', 0 I0124 04:32:43.390118 1 grpc_server.cc:225] Ready for RPC 'CudaSharedMemoryStatus', 0 I0124 04:32:43.390127 1 grpc_server.cc:225] Ready for RPC 'CudaSharedMemoryRegister', 0 I0124 04:32:43.390137 1 grpc_server.cc:225] Ready for RPC 'CudaSharedMemoryUnregister', 0 I0124 04:32:43.390146 1 grpc_server.cc:225] Ready for RPC 'RepositoryIndex', 0 I0124 04:32:43.390153 1 grpc_server.cc:225] Ready for RPC 'RepositoryModelLoad', 0 I0124 04:32:43.390163 1 grpc_server.cc:225] Ready for RPC 'RepositoryModelUnload', 0 I0124 04:32:43.390205 1 grpc_server.cc:416] Thread started for CommonHandler I0124 04:32:43.390516 1 grpc_server.cc:3082] New request handler for ModelInferHandler, 1 I0124 04:32:43.390584 1 grpc_server.cc:2146] Thread started for ModelInferHandler I0124 04:32:43.390843 1 grpc_server.cc:3427] New request handler for ModelStreamInferHandler, 3 I0124 04:32:43.390948 1 grpc_server.cc:2146] Thread started for ModelStreamInferHandler I0124 04:32:43.390967 1 grpc_server.cc:3979] Started GRPCInferenceService at 0.0.0.0:8001 I0124 04:32:43.391403 1 http_server.cc:2717] Started HTTPService at 0.0.0.0:8000 I0124 04:32:43.433721 1 http_server.cc:2736] Started Metrics Service at 0.0.0.0:8002 W0124 04:32:45.128899 1 metrics.cc:320] failed to get energy consumption for GPU 0: Not Supported W0124 04:32:47.131223 1 metrics.cc:320] failed to get energy consumption for GPU 0: Not Supported W0124 04:32:49.136853 1 metrics.cc:320] failed to get energy consumption for GPU 0: Not Supported --model-control-mode=explicitI0124 04:33:14.710252 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 0 step START I0124 04:33:14.710292 1 grpc_server.cc:225] Ready for RPC 'ServerReady', 1 I0124 04:33:14.710302 1 model_repository_manager.cc:451] BackendStates() I0124 04:33:14.710358 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 0 step COMPLETE I0124 04:33:14.710367 1 grpc_server.cc:408] Done for ServerReady, 0 I0124 04:33:15.799709 1 http_server.cc:171] HTTP request: 0 /metrics

Starting the Model Analyzer Docker:

ubuntu@xxxxx-inference-2:~$ sudo docker run --gpus 1 --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 -ti -v /var/run/docker.sock:/var/run/docker.sock --net host --privileged -v /home/ubuntu/sec-models/:/home/ubuntu/sec-models/ -v /home/ubuntu/plugins/:/plugins --env LD_PRELOAD="/plugins/libyolo_layer.so:/plugins/libdecodeplugin.so" triton_modelanalyzer bash root@tensorgo-inference-2:/opt/triton-model-analyzer# root@tensorgo-inference-2:/opt/triton-model-analyzer# root@tensorgo-inference-2:/opt/triton-model-analyzer#

Run Model Analyzer inside Docker:

root@xxxxxxxxx-inference-2:/opt/triton-model-analyzer# model-analyzer -m /home/ubuntu/sec-models/ -n retinaface-16 --batch-sizes 1,2,4,8,16 -c 1,2,3 2021-01-24 04:33:14.691 INFO[entrypoint.py:381] Triton Model Analyzer started: config={'model_repository': '/home/ubuntu/sec-models/', 'model_names': 'retinaface-16', 'batch_sizes': '1,2,4,8,16', 'concurrency': '1,2,3', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'local', 'triton_version': '20.11-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None} 2021-01-24 04:33:14.695 INFO[entrypoint.py:91] Starting a local Triton Server... 2021-01-24 04:33:14.707 INFO[server_local.py:63] Triton Server started. 2021-01-24 04:33:14.710 INFO[entrypoint.py:207] Triton Server is ready. 2021-01-24 04:33:14.711 INFO[driver.py:236] init 2021-01-24 04:33:15.801 INFO[entrypoint.py:396] Starting perf_analyzer... 2021-01-24 04:33:15.801 INFO[analyzer.py:95] Profiling server only metrics... 2021-01-24 04:33:16.824 INFO[gpu_monitor.py:74] Using GPU(s) with UUID(s) = { GPU-571b5a2c-496c-573a-a773-b251209b15d3 } for the analysis. 2021-01-24 04:33:22.956 ERROR[entrypoint.py:400] Model Analyzer encountered an error: Unable to load the model : [StatusCode.INTERNAL] failed to load 'retinaface-16', no version is available Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 79, in load_model self._client.load_model(model.name()) File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/init.py", line 555, in load_model raise_error_grpc(rpc_error) File "/usr/local/lib/python3.6/dist-packages/tritonclient/grpc/init.py", line 61, in raise_error_grpc raise get_error_grpc(rpc_error) from None tritonclient.utils.InferenceServerException: [StatusCode.INTERNAL] failed to load 'retinaface-16', no version is available

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 397, in main run_analyzer(config, analyzer, client, run_configs) File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 323, in run_analyzer client.load_model(model=model) File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 82, in load_model f"Unable to load the model : {e}") model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Unable to load the model : [StatusCode.INTERNAL] failed to load 'retinaface-16', no version is available 2021-01-24 04:33:22.957 INFO[server_local.py:80] Triton Server stopped.

naveengogineni commented 3 years ago

@Tabrizian

Below is the log seen on Triton Server, when the command (ran into error) is executed on Model Analyzer inside Docker (as mentioned in above comment) -

I0124 16:02:07.090487 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 0 step START I0124 16:02:07.090528 1 grpc_server.cc:225] Ready for RPC 'ServerReady', 1 I0124 16:02:07.090539 1 model_repository_manager.cc:451] BackendStates() I0124 16:02:07.090609 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 0 step COMPLETE I0124 16:02:07.090701 1 grpc_server.cc:408] Done for ServerReady, 0 I0124 16:02:08.181914 1 http_server.cc:171] HTTP request: 0 /metrics I0124 16:02:15.335941 1 grpc_server.cc:270] Process for RepositoryModelLoad, rpc_ok=1, 0 step START I0124 16:02:15.335991 1 grpc_server.cc:225] Ready for RPC 'RepositoryModelLoad', 1 I0124 16:02:15.336274 1 model_repository_manager.cc:578] AsyncLoad() 'retinaface-16' I0124 16:02:15.336339 1 model_repository_manager.cc:475] VersionStates() 'retinaface-16' I0124 16:02:15.337065 1 grpc_server.cc:270] Process for RepositoryModelLoad, rpc_ok=1, 0 step WRITEREADY I0124 16:02:15.337154 1 grpc_server.cc:270] Process for RepositoryModelLoad, rpc_ok=1, 0 step COMPLETE I0124 16:02:15.337162 1 grpc_server.cc:408] Done for RepositoryModelLoad, 0

aramesh7 commented 3 years ago

@naveengogineni

If I understand you correctly, you want to attach model analyzer to a running instance of Triton (i.e. remote mode of Model Analyzer). It seems you are running Model Analyzer with the default --triton-launch-mode. You will need to use --triton-launch-mode=remote --triton-grpc-endpoint=localhost:8001 --triton-metrics-endpoint=localhost:8002

Additionally, please check whether Triton is able to find the model at the specified path /home/ubuntu/sec_models. This can happen if the mounted directory is empty inside the docker container. .

naveengogineni commented 3 years ago

@xprotobeast2

Yes, You understood it correctly, We want to attach model analyzer to a running instance of Triton (i.e. remote mode of Model Analyzer). Thanks for pointing to the argument --triton-launch-mode=remote Please find the results below -

root@xxxx-inference-2:/opt/triton-model-analyzer# model-analyzer -m /home/ubuntu/sec-models/ -n retinaface-16 --batch-sizes 1,2,4,8,16 -c 1,2,3 --triton-launch-mode=remote --triton-grpc-endpoint=localhost:8001 --triton-metrics-url=localhost:8002 2021-01-25 03:41:33.467 INFO[entrypoint.py:381] Triton Model Analyzer started: config={'model_repository': '/home/ubuntu/sec-models/', 'model_names': 'retinaface-16', 'batch_sizes': '1,2,4,8,16', 'concurrency': '1,2,3', 'export': None, 'export_path': '.', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_server_only': 'metrics-server-only.csv', 'max_retries': 100, 'duration_seconds': 5.0, 'monitoring_interval': 0.01, 'client_protocol': 'grpc', 'perf_analyzer_path': 'perf_analyzer', 'perf_measurement_window': 5000, 'no_perf_output': None, 'triton_launch_mode': 'remote', 'triton_version': '20.11-py3', 'log_level': 'INFO', 'triton_http_endpoint': 'localhost:8000', 'triton_grpc_endpoint': 'localhost:8001', 'triton_metrics_url': 'localhost:8002', 'triton_server_path': 'tritonserver', 'triton_output_path': None, 'gpus': ['all'], 'config_file': None} 2021-01-25 03:41:33.472 INFO[entrypoint.py:80] Using remote Triton Server... 2021-01-25 03:41:38.540 ERROR[entrypoint.py:400] Model Analyzer encountered an error: Could not determine server readiness. Number of retries exceeded. Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 384, in main client, server = get_triton_handles(config) File "/usr/local/lib/python3.6/dist-packages/model_analyzer/entrypoint.py", line 206, in get_triton_handles client.wait_for_server_ready(num_retries=config.max_retries) File "/usr/local/lib/python3.6/dist-packages/model_analyzer/triton/client/client.py", line 58, in wait_for_server_ready "Could not determine server readiness. " model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: Could not determine server readiness. Number of retries exceeded.

Below is the log seen on Triton Server, when the command (ran into error) is executed on Model Analyzer inside Docker (as mentioned in above comment) -

I0125 03:40:46.181731 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 301 step COMPLETE I0125 03:40:46.181854 1 grpc_server.cc:408] Done for ServerReady, 301 I0125 03:40:46.232328 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 302 step START I0125 03:40:46.232368 1 grpc_server.cc:225] Ready for RPC 'ServerReady', 303 I0125 03:40:46.232376 1 model_repository_manager.cc:451] BackendStates() I0125 03:40:46.232554 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 302 step COMPLETE I0125 03:40:46.232602 1 grpc_server.cc:408] Done for ServerReady, 302 ............................................................. I0125 03:41:38.338199 1 grpc_server.cc:408] Done for ServerReady, 497 I0125 03:41:38.388608 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 498 step START I0125 03:41:38.388634 1 grpc_server.cc:225] Ready for RPC 'ServerReady', 499 I0125 03:41:38.388641 1 model_repository_manager.cc:451] BackendStates() I0125 03:41:38.388678 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 498 step COMPLETE I0125 03:41:38.388685 1 grpc_server.cc:408] Done for ServerReady, 498 I0125 03:41:38.439095 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 499 step START I0125 03:41:38.439124 1 grpc_server.cc:225] Ready for RPC 'ServerReady', 500 I0125 03:41:38.439149 1 model_repository_manager.cc:451] BackendStates() I0125 03:41:38.439187 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 499 step COMPLETE I0125 03:41:38.439194 1 grpc_server.cc:408] Done for ServerReady, 499 I0125 03:41:38.489677 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 500 step START I0125 03:41:38.489706 1 grpc_server.cc:225] Ready for RPC 'ServerReady', 501 I0125 03:41:38.489713 1 model_repository_manager.cc:451] BackendStates() I0125 03:41:38.489748 1 grpc_server.cc:270] Process for ServerReady, rpc_ok=1, 500 step COMPLETE I0125 03:41:38.489755 1 grpc_server.cc:408] Done for ServerReady, 500

naveengogineni commented 3 years ago

@xprotobeast2

Currently, We are able to load the models to remote triton inference server explicitly from model analyzer and got the models (with and without plugins) benchmarked. Thanks a lot for your support.

aramesh7 commented 3 years ago

@naveengogineni I'm glad it worked out. What was the issue near the end? Sometimes you may need to adjust --max-retries.

naveengogineni commented 3 years ago

@xprotobeast2 Actually, I tried to load a model that does not exist after restarting the Triton Server with the --model-control-mode=explicit and then was encountered with this error (Number of retries exceeded) in the second run. However, these flags (--model-control-mode=explicit, --triton-launch-mode remote) worked when the loaded model exist in the repo.