triton-inference-server / model_analyzer

Triton Model Analyzer is a CLI tool to help with better understanding of the compute and memory requirements of the Triton Inference Server models.
Apache License 2.0
426 stars 74 forks source link

model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The first 2 attempts to acquire measurements have failed. #920

Open gizarchik opened 3 months ago

gizarchik commented 3 months ago

When I try to analyze my ensemble I get this error:

Traceback (most recent call last):
  File "/usr/local/bin/model-analyzer", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/entrypoint.py", line 278, in main
    analyzer.profile(
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/analyzer.py", line 128, in profile
    self._profile_models()
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/analyzer.py", line 247, in _profile_models
    self._model_manager.run_models(models=[model])
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/model_manager.py", line 151, in run_models
    self._stop_ma_if_no_valid_measurement_threshold_reached()
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/model_manager.py", line 245, in _stop_ma_if_no_valid_measurement_threshold_reached
    raise TritonModelAnalyzerException(
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The first 2 attempts to acquire measurements have failed. Please examine the Tritonserver/PA error logs to determine what has gone wrong.

My command:

model-analyzer profile -f config.yaml 

My config in config.yaml:

model_repository: /home/triton-server/profiling/model_mood_ensemble
override_output_model_repository: True

run_config_search_mode: quick
collect_cpu_metrics: True

cpu_only_composing_models:
      [ 'file_downloader', 'model_preprocessor', 'model_postprocessor' ]

profile_models:
  model_mood_ensemble:
    perf_analyzer_flags:
      input-data: perf_analyzer_data.json
      shape: 'link_to_file:1'

triton_launch_mode: remote
triton_http_endpoint: 10.111.13.50:8000
triton_grpc_endpoint: 10.111.13.50:8001
triton_metrics_url: http://10.111.13.50:8002/metrics

PA logs (perf_analyzer_error.log):

Command: 
perf_analyzer -m model_mood_ensemble -b 1 -u 10.111.13.85:8001 -i grpc -f model_mood_ensemble-results.csv --verbose-csv --concurrency-range 4 --input-data perf_analyzer_data.json --shape link_to_file:1 --measurement-mode count_windows --collect-metrics --metrics-url http://10.111.13.85:8002/metrics --metrics-interval 1000

Error: perf_analyzer did not produce any output. It was likely terminated with a SIGABRT.

Command: 
perf_analyzer -m model_mood_ensemble -b 1 -u 10.111.13.85:8001 -i grpc -f model_mood_ensemble-results.csv --verbose-csv --concurrency-range 2 --input-data perf_analyzer_data.json --shape link_to_file:1 --measurement-mode count_windows --collect-metrics --metrics-url http://10.111.13.85:8002/metrics --metrics-interval 1000

Error: perf_analyzer did not produce any output. It was likely terminated with a SIGABRT.

MA logs:

[Model Analyzer] Initializing GPUDevice handles
[Model Analyzer] Using GPU 0 NVIDIA A40 with UUID GPU-b84a3bad-5448-2ad9-3780-f0261c1d1eac
[Model Analyzer] Using GPU 1 NVIDIA A40 with UUID GPU-4aad5a35-99bd-90bd-b0e6-11bac59e302c
[Model Analyzer] WARNING: Overriding the output model repo path "/home/triton-server/profiling/output_model_repository"
[Model Analyzer] Using remote Triton Server
[Model Analyzer] WARNING: GPU memory metrics reported in the remote mode are not accurate. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models.
[Model Analyzer] No checkpoint file found, starting a fresh run.
[Model Analyzer] WARNING: A model not being profiled (custom_mood_ensemble) is loaded on the remote Tritonserver. This could impact the profile results.
[Model Analyzer] WARNING: A model not being profiled (file_downloader) is loaded on the remote Tritonserver. This could impact the profile results.
[Model Analyzer] WARNING: A model not being profiled (model) is loaded on the remote Tritonserver. This could impact the profile results.
[Model Analyzer] WARNING: A model not being profiled (model_classifier) is loaded on the remote Tritonserver. This could impact the profile results.
[Model Analyzer] WARNING: A model not being profiled (model_classifier_postprocessor) is loaded on the remote Tritonserver. This could impact the profile results.
[Model Analyzer] WARNING: A model not being profiled (model_postprocessor) is loaded on the remote Tritonserver. This could impact the profile results.
[Model Analyzer] WARNING: A model not being profiled (model_preprocessor) is loaded on the remote Tritonserver. This could impact the profile results.
[Model Analyzer] Profiling server only metrics...
[Model Analyzer] Using remote Triton Server
[Model Analyzer] WARNING: GPU memory metrics reported in the remote mode are not accurate. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models.
[Model Analyzer] Using remote Triton Server
[Model Analyzer] WARNING: GPU memory metrics reported in the remote mode are not accurate. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models.
[Model Analyzer] Using remote Triton Server
[Model Analyzer] WARNING: GPU memory metrics reported in the remote mode are not accurate. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models.
[Model Analyzer] Using remote Triton Server
[Model Analyzer] WARNING: GPU memory metrics reported in the remote mode are not accurate. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models.
[Model Analyzer] Using remote Triton Server
[Model Analyzer] WARNING: GPU memory metrics reported in the remote mode are not accurate. Model Analyzer uses Triton explicit model control to load/unload models. Some frameworks do not release the GPU memory even when the memory is not being used. Consider using the "local" or "docker" mode if you want to accurately monitor the GPU memory usage for different models.
[Model Analyzer] 
[Model Analyzer] Starting quick mode search to find optimal configs
[Model Analyzer] 
[Model Analyzer] Creating model config: file_downloader_config_default
[Model Analyzer] 
[Model Analyzer] Creating model config: model_preprocessor_config_default
[Model Analyzer] 
[Model Analyzer] Creating model config: model_config_default
[Model Analyzer] 
[Model Analyzer] Creating model config: model_postprocessor_config_default
[Model Analyzer] 
[Model Analyzer] Creating ensemble model config: model_mood_ensemble_config_default
[Model Analyzer] Profiling model_mood_ensemble_config_default: concurrency=4
[Model Analyzer] WARNING: CPU metrics are being collected. This can affect the latency or throughput numbers reported by perf analyzer.
[Model Analyzer] perf_analyzer took very long to exit, killing perf_analyzer
[Model Analyzer] perf_analyzer did not produce any output.
[Model Analyzer] Saved checkpoint to /home/triton-server/profiling/checkpoints/0.ckpt
[Model Analyzer] Creating model config: file_downloader_config_0
[Model Analyzer]   Setting instance_group to [{'count': 1, 'kind': 'KIND_CPU'}]
[Model Analyzer] 
[Model Analyzer] Creating model config: model_preprocessor_config_0
[Model Analyzer]   Setting instance_group to [{'count': 1, 'kind': 'KIND_CPU'}]
[Model Analyzer]   Setting max_batch_size to 1
[Model Analyzer]   Enabling dynamic_batching
[Model Analyzer] 
[Model Analyzer] Creating model config: model_config_0
[Model Analyzer]   Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]
[Model Analyzer] 
[Model Analyzer] Creating model config: model_postprocessor_config_0
[Model Analyzer]   Setting instance_group to [{'count': 1, 'kind': 'KIND_CPU'}]
[Model Analyzer] 
[Model Analyzer] Creating ensemble model config: model_mood_ensemble_config_0
[Model Analyzer]   Setting max_batch_size to 1
[Model Analyzer] Profiling model_mood_ensemble_config_0: concurrency=2
[Model Analyzer] perf_analyzer took very long to exit, killing perf_analyzer
[Model Analyzer] perf_analyzer did not produce any output.
[Model Analyzer] No changes made to analyzer data, no checkpoint saved.
Traceback (most recent call last):
  File "/usr/local/bin/model-analyzer", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/entrypoint.py", line 278, in main
    analyzer.profile(
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/analyzer.py", line 128, in profile
    self._profile_models()
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/analyzer.py", line 247, in _profile_models
    self._model_manager.run_models(models=[model])
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/model_manager.py", line 151, in run_models
    self._stop_ma_if_no_valid_measurement_threshold_reached()
  File "/usr/local/lib/python3.10/dist-packages/model_analyzer/model_manager.py", line 245, in _stop_ma_if_no_valid_measurement_threshold_reached
    raise TritonModelAnalyzerException(
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The first 2 attempts to acquire measurements have failed. Please examine the Tritonserver/PA error logs to determine what has gone wrong.

Moreover, according to the tritonserver logs, the ensemble receives requests and processes them. Also when I tried to run PA separately, there was a similar problem. Maybe the problem is in it

MA version 1.41.0 Release 2.47.0 corresponding to NGC container 24.06

nv-braf commented 3 months ago

It appears that MA is operating correctly and the problem is in PA. @matthewkotila can you look into this?