triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Apache License 2.0
419 stars 74 forks source link

--triton-launch-mode=remote #874

Open riyajatar37003 opened 4 months ago

riyajatar37003 commented 4 months ago

Hi ,

can u share any example/command for these mode.?

during launching i am doing this way "tritonserver --model-control-mode explicit --exit-on-error=false --model-repository=/tmp/models"

and in the other container i am running this " model-analyzer profile \ --profile-models reranker --triton-launch-mode=remote \ --output-model-repository-path ./output \ --export-path profile_results--triton-http-endpoint "

but triton-server itself not launching

tgerdesnv commented 4 months ago

--triton-launch-mode=remote tells model analyzer to not launch tritonserver. The expectation is that there is already a server up and running (usually on a different machine).

riyajatar37003 commented 4 months ago

what is this then

This mode is beneficial when you want to use an already running Triton Inference Server. You may provide the URLs for the Triton instance's HTTP or GRPC endpoint depending on your chosen client protocol using the --triton-grpc-endpoint, and --triton-http-endpoint flags. You should also make sure that same GPUs are available to the Inference Server and Model Analyzer and they are on the same machine. Triton Server in this mode needs to be launched with --model-control-mode explicit flag to support loading/unloading of the models.

riyajatar37003 commented 4 months ago

now i am getting this error

Model Analyzer] Initializing GPUDevice handles

[Model Analyzer] Using GPU 0 NVIDIA A100-SXM4-40GB with UUID GPU-d9a0447f-f8fa-9d2f-79fc-ecf2567dacc2

[Model Analyzer] WARNING: Overriding the output model repo path "./rerenker_output1"

[Model Analyzer] Starting a local Triton Server

[Model Analyzer] Loaded checkpoint from file /model_repositories/checkpoints/0.ckpt

[Model Analyzer] GPU devices match checkpoint - skipping server metric acquisition

[Model Analyzer]

[Model Analyzer] Starting quick mode search to find optimal configs

[Model Analyzer]

[Model Analyzer] Creating model config: reranker_config_default

[Model Analyzer]

[Model Analyzer] Creating model config: bge_reranker_v2_onnx_config_default

[Model Analyzer]

[Model Analyzer] Profiling reranker_config_default: client batch size=1, concurrency=24

[Model Analyzer] Profiling bge_reranker_v2_onnx_config_default: client batch size=1, concurrency=8

[Model Analyzer]

[Model Analyzer] perf_analyzer took very long to exit, killing perf_analyzer

[Model Analyzer] perf_analyzer did not produce any output.

[Model Analyzer] Saved checkpoint to model_repositories/checkpoints/1.ckpt

[Model Analyzer] Creating model config: reranker_config_0

[Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]

[Model Analyzer] Setting max_batch_size to 1

[Model Analyzer] Enabling dynamic_batching

[Model Analyzer]

[Model Analyzer] Creating model config: bge_reranker_v2_onnx_config_0

[Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]

[Model Analyzer] Setting max_batch_size to 1

[Model Analyzer] Enabling dynamic_batching

[Model Analyzer]

[Model Analyzer] Profiling reranker_config_0: client batch size=1, concurrency=2

[Model Analyzer] Profiling bge_reranker_v2_onnx_config_0: client batch size=1, concurrency=2

[Model Analyzer]

[Model Analyzer] perf_analyzer took very long to exit, killing perf_analyzer

[Model Analyzer] perf_analyzer did not produce any output.

[Model Analyzer] No changes made to analyzer data, no checkpoint saved.

Traceback (most recent call last):

File "/opt/app_venv/bin/model-analyzer", line 8, in

sys.exit(main())

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/entrypoint.py", line 278, in main

analyzer.profile(

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 124, in profile

self._profile_models()

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 233, in _profile_models

self._model_manager.run_models(models=models)

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 145, in run_models

self._stop_ma_if_no_valid_measurement_threshold_reached()

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 239, in _stop_ma_if_no_valid_measurement_threshold_reached

raise TritonModelAnalyzerException(

model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The first 2 attempts to acquire measurements have failed. Please examine the Tritonserver/PA error logs to determine what has gone wrong.

nv-braf commented 4 months ago

MA is not receiving a measurement from Perf Analyzer within the timeout window (600s). After two attempts without measurements, MA exits and directs you to examine the error logs to determine what has gone wrong. There can be a variety of reasons why this is occurring. Please examine the PA error log for additional details.