triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Apache License 2.0
400 stars 74 forks source link

Model Analyzer fails with error [StatusCode.UNAVAILABLE] Socket closed #599

Closed feicccccccc closed 10 months ago

feicccccccc commented 1 year ago

I am following the quick start and trying to run model-analyzer on peoplenet (which I set up based on deepstream-test3). I can run the tritonserver and the model without any problems. When I run model-analyzer on peoplenet, I encounter the following error

setup:

[Model Analyzer] Initializing GPUDevice handles [Model Analyzer] Using GPU 0 NVIDIA GeForce RTX 3080 Ti with ...

tritonserver model-repository

├── models │   ├── peoplenet │   │   ├── 1 │   │   │   └── resnet34_peoplenet_int8.etlt_b60_gpu0_int8.engine │   │   └── config.pbtxt

cmd:

model-analyzer profile --model-repository /home/user/Documents/models/ --profile-models peoplenet --triton-launch-mode=docker --output-model-repository /home/user/Documents/model_analyzer/output/peoplenet/ --override-output-model-repository

error message:

[Model Analyzer] Starting a Triton Server using docker [Model Analyzer] Loaded checkpoint from file /workspace/checkpoints/2.ckpt [Model Analyzer] Profiling server only metrics... [Model Analyzer] [Model Analyzer] Creating model config: peoplenet_config_default [Model Analyzer] [Model Analyzer] Model peoplenet_config_default load failed: [StatusCode.UNAVAILABLE] Socket closed [Model Analyzer] [Model Analyzer] Found existing model config: peoplenet_config_0 [Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}] [Model Analyzer] Setting max_batch_size to 1 [Model Analyzer] Enabling dynamic_batching [Model Analyzer] [Model Analyzer] Model peoplenet_config_0 load failed: [StatusCode.UNAVAILABLE] Socket closed [Model Analyzer] [Model Analyzer] Found existing model config: peoplenet_config_1 [Model Analyzer] Setting instance_group to [{'count': 2, 'kind': 'KIND_GPU'}] [Model Analyzer] Setting max_batch_size to 1 [Model Analyzer] Enabling dynamic_batching [Model Analyzer] [Model Analyzer] Model peoplenet_config_1 load failed: [StatusCode.UNAVAILABLE] Socket closed [Model Analyzer] [Model Analyzer] Found existing model config: peoplenet_config_2 [Model Analyzer] Setting instance_group to [{'count': 3, 'kind': 'KIND_GPU'}] [Model Analyzer] Setting max_batch_size to 1 [Model Analyzer] Enabling dynamic_batching [Model Analyzer] [Model Analyzer] Model peoplenet_config_2 load failed: [StatusCode.UNAVAILABLE] Socket closed [Model Analyzer] [Model Analyzer] Found existing model config: peoplenet_config_3 [Model Analyzer] Setting instance_group to [{'count': 4, 'kind': 'KIND_GPU'}] [Model Analyzer] Setting max_batch_size to 1 [Model Analyzer] Enabling dynamic_batching [Model Analyzer] [Model Analyzer] Model peoplenet_config_3 load failed: [StatusCode.UNAVAILABLE] Socket closed [Model Analyzer] [Model Analyzer] Found existing model config: peoplenet_config_4 [Model Analyzer] Setting instance_group to [{'count': 5, 'kind': 'KIND_GPU'}] [Model Analyzer] Setting max_batch_size to 1 [Model Analyzer] Enabling dynamic_batching [Model Analyzer] [Model Analyzer] Model peoplenet_config_4 load failed: [StatusCode.UNAVAILABLE] Socket closed [Model Analyzer] Saved checkpoint to /workspace/checkpoints/3.ckpt [Model Analyzer] Profile complete. Profiled 2 configurations for models: ['densenet_onnx'] [Model Analyzer] [Model Analyzer] ERROR: No results found for model: peoplenet Traceback (most recent call last): File "/usr/local/bin/model-analyzer", line 8, in sys.exit(main()) File "/usr/local/lib/python3.8/dist-packages/model_analyzer/entrypoint.py", line 264, in main analyzer.profile(client=client, File "/usr/local/lib/python3.8/dist-packages/model_analyzer/analyzer.py", line 116, in profile self._analyze_models() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/analyzer.py", line 248, in _analyze_models self._result_manager.compile_and_sort_results() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/result/result_manager.py", line 140, in compile_and_sort_results self._add_results_to_heaps() File "/usr/local/lib/python3.8/dist-packages/model_analyzer/result/result_manager.py", line 325, in _add_results_to_heaps raise TritonModelAnalyzerException( model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The model peoplenet was not found in the loaded checkpoint.

Please kindly advice the next step.

nv-braf commented 1 year ago

Looks like the model is not loading onto the tritonserver which is why you are not seeing any results. This could be due to a number of issues.

The best way to debug this is to remove all checkpoints and use the --triton-output-path CLI option, on a fresh run, to specify a log file which will store the tritonserver output messages.

If the error message from the log is still unclear. Please post it and I'll help you to figure out what has gone wrong.

dyastremsky commented 10 months ago

Closing issue due to inactivity. If you would like us to reopen this issue for follow-up, please let us know.