triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.04k stars 1.44k forks source link

model analyser stucks #7223

Open riyajatar37003 opened 4 months ago

riyajatar37003 commented 4 months ago

model-analyzer profile --run-config-profile-models-concurrently-enable --override-output-model-repository --model-repository model_repositories --profile-models model1\,model2 --output-model-repository-path ./model1_op --export-path model1_report

after running above command i am getting following error:

Model Analyzer] Initializing GPUDevice handles

[Model Analyzer] Using GPU 0 NVIDIA A100-SXM4-40GB with UUID GPU-d9a0447f-f8fa-9d2f-79fc-ecf2567dacc2

[Model Analyzer] WARNING: Overriding the output model repo path "./m_output1"

[Model Analyzer] Starting a local Triton Server

[Model Analyzer] Loaded checkpoint from file /model_repositories/checkpoints/0.ckpt

[Model Analyzer] GPU devices match checkpoint - skipping server metric acquisition

[Model Analyzer]

[Model Analyzer] Starting quick mode search to find optimal configs

[Model Analyzer]

[Model Analyzer] Creating model config: m_config_default

[Model Analyzer]

[Model Analyzer] Creating model config: m_v2_onnx_config_default

[Model Analyzer]

[Model Analyzer] Profiling m_config_default: client batch size=1, concurrency=24

[Model Analyzer] Profiling m_v2_onnx_config_default: client batch size=1, concurrency=8

[Model Analyzer]

[Model Analyzer] perf_analyzer took very long to exit, killing perf_analyzer

[Model Analyzer] perf_analyzer did not produce any output.

[Model Analyzer] Saved checkpoint to model_repositories/checkpoints/1.ckpt

[Model Analyzer] Creating model config: m_config_0

[Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]

[Model Analyzer] Setting max_batch_size to 1

[Model Analyzer] Enabling dynamic_batching

[Model Analyzer]

[Model Analyzer] Creating model config: m_v2_onnx_config_0

[Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]

[Model Analyzer] Setting max_batch_size to 1

[Model Analyzer] Enabling dynamic_batching

[Model Analyzer]

[Model Analyzer] Profiling m_config_0: client batch size=1, concurrency=2

[Model Analyzer] Profiling m_v2_onnx_config_0: client batch size=1, concurrency=2

[Model Analyzer]

[Model Analyzer] perf_analyzer took very long to exit, killing perf_analyzer

[Model Analyzer] perf_analyzer did not produce any output.

[Model Analyzer] No changes made to analyzer data, no checkpoint saved.

Traceback (most recent call last):

File "/opt/app_venv/bin/model-analyzer", line 8, in

sys.exit(main())

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/entrypoint.py", line 278, in main

analyzer.profile(

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 124, in profile

self._profile_models()

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 233, in _profile_models

self._model_manager.run_models(models=models)

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 145, in run_models

self._stop_ma_if_no_valid_measurement_threshold_reached()

File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 239, in _stop_ma_if_no_valid_measurement_threshold_reached

raise TritonModelAnalyzerException(

model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The first 2 attempts to acquire measurements have failed. Please examine the Tritonserver/PA error logs to determine what has gone wrong.

riyajatar37003 commented 4 months ago

16:36:34 [Model Analyzer] DEBUG: {'always_report_gpu_metrics': False, 'batch_sizes': [1], 'bls_composing_models': [], 'checkpoint_directory': '/app/snow.atg_arch_only.home/users/ariyaz/ml_repos/model_repositories/checkpoints', 'client_max_retries': 50, 'client_protocol': 'grpc', 'collect_cpu_metrics': False, 'concurrency': [], 'config_file': 'config.yaml', 'constraints': {}, 'cpu_only_composing_models': [], 'duration_seconds': 3, 'early_exit_enable': False, 'export_path': './profile_results_reranker1', 'filename_model_gpu': 'metrics-model-gpu.csv', 'filename_model_inference': 'metrics-model-inference.csv', 'filename_server_only': 'metrics-server-only.csv', 'genai_perf_flags': {}, 'gpu_output_fields': ['model_name', 'gpu_uuid', 'batch_size', 'concurrency', 'model_config_path', 'instance_group', 'satisfies_constraints', 'gpu_used_memory', 'gpu_utilization', 'gpu_power_usage'], 'gpus': ['all'], 'inference_output_fields': ['model_name', 'batch_size', 'concurrency', 'model_config_path', 'instance_group', 'max_batch_size', 'satisfies_constraints', 'perf_throughput', 'perf_latency_p99'], 'latency_budget': None, 'min_throughput': None, 'model_repository': '/app/snow.atg_arch_only.home/users/ariyaz/ml_repos/model_repositories/reranker', 'model_type': 'generic', 'monitoring_interval': 1.0, 'num_configs_per_model': 2, 'num_top_model_configs': 0, 'objectives': {'perf_throughput': 10}, 'output_model_repository_path': './rerenker_output1', 'override_output_model_repository': True, 'perf_analyzer_cpu_util': 5120.0, 'perf_analyzer_flags': {}, 'perf_analyzer_max_auto_adjusts': 10, 'perf_analyzer_path': 'perf_analyzer', 'perf_analyzer_timeout': 600, 'perf_output': False, 'perf_output_path': None, 'plots': [{'name': 'throughput_v_latency', 'title': 'Throughput vs. Latency', 'x_axis': 'perf_latency_p99', 'y_axis': 'perf_throughput', 'monotonic': True}, {'name': 'gpu_mem_v_latency', 'title': 'GPU Memory vs. Latency', 'x_axis': 'perf_latency_p99', 'y_axis': 'gpu_used_memory', 'monotonic': False}], 'profile_models': [{'model_name': 'bge_reranker_v2_onnx', 'cpu_only': False, 'objectives': {'perf_throughput': 10}, 'parameters': {'batch_sizes': [1], 'concurrency': [], 'request_rate': []}, 'weighting': 1}, {'model_name': 'reranker', 'cpu_only': False, 'objectives': {'perf_throughput': 10}, 'parameters': {'batch_sizes': [1], 'concurrency': [], 'request_rate': []}, 'weighting': 1}], 'reload_model_disable': False, 'request_rate': [], 'request_rate_search_enable': False, 'run_config_profile_models_concurrently_enable': True, 'run_config_search_disable': False, 'run_config_search_max_binary_search_steps': 5, 'run_config_search_max_concurrency': 2, 'run_config_search_max_instance_count': 4, 'run_config_search_max_model_batch_size': 4, 'run_config_search_max_request_rate': 8192, 'run_config_search_min_concurrency': 1, 'run_config_search_min_instance_count': 1, 'run_config_search_min_model_batch_size': 1, 'run_config_search_min_request_rate': 16, 'run_config_search_mode': 'quick', 'server_output_fields': ['model_name', 'gpu_uuid', 'gpu_used_memory', 'gpu_utilization', 'gpu_power_usage'], 'skip_detailed_reports': False, 'skip_summary_reports': False, 'triton_docker_args': {}, 'triton_docker_image': 'nvcr.io/nvidia/tritonserver:24.04-py3', 'triton_docker_labels': {}, 'triton_docker_mounts': [], 'triton_docker_shm_size': None, 'triton_grpc_endpoint': 'localhost:8001', 'triton_http_endpoint': 'localhost:8000', 'triton_install_path': '/opt/tritonserver', 'triton_launch_mode': 'local', 'triton_metrics_url': 'http://localhost:8002/metrics', 'triton_output_path': None, 'triton_server_environment': {}, 'triton_server_flags': {}, 'triton_server_path': 'tritonserver', 'weighting': None} 16:36:34 [Model Analyzer] Initializing GPUDevice handles 16:36:35 [Model Analyzer] Using GPU 0 Tesla V100-SXM2-32GB with UUID GPU-c898354c-1e75-3b40-3c84-2a272ee206c2 16:36:36 [Model Analyzer] WARNING: Overriding the output model repo path "./rerenker_output1" 16:36:36 [Model Analyzer] Starting a local Triton Server 16:36:36 [Model Analyzer] No checkpoint file found, starting a fresh run. 16:36:36 [Model Analyzer] Profiling server only metrics... 16:36:36 [Model Analyzer] DEBUG: Triton Server started. 16:36:46 [Model Analyzer] DEBUG: Stopped Triton Server. 16:36:46 [Model Analyzer] 16:36:46 [Model Analyzer] Starting quick mode search to find optimal configs 16:36:46 [Model Analyzer] 16:36:46 [Model Analyzer] Creating model config: bge_reranker_v2_onnx_config_default 16:36:46 [Model Analyzer] 16:36:46 [Model Analyzer] Creating model config: reranker_config_default 16:36:46 [Model Analyzer] 16:36:58 [Model Analyzer] DEBUG: Triton Server started. 16:37:07 [Model Analyzer] DEBUG: Model bge_reranker_v2_onnx_config_default loaded 16:37:22 [Model Analyzer] DEBUG: Model reranker_config_default loaded 16:37:22 [Model Analyzer] Profiling bge_reranker_v2_onnx_config_default: client batch size=1, concurrency=8 16:37:22 [Model Analyzer] Profiling reranker_config_default: client batch size=1, concurrency=16 16:37:22 [Model Analyzer] 16:37:22 [Model Analyzer] DEBUG: Running ['mpiexec', '--allow-run-as-root', '--tag-output', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'bge_reranker_v2_onnx', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'bge_reranker_v2_onnx-results.csv', '--verbose-csv', '--concurrency-range', '8', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000', ':', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'reranker', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'reranker-results.csv', '--verbose-csv', '--concurrency-range', '16', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000'] 16:37:26 [Model Analyzer] Running perf_analyzer failed with exit status 99: [1,1]: Measurement Settings [1,1]: Batch size: 1 [1,1]: Service Kind: Triton [1,1]: Using "count_windows" mode for stabilization [1,1]: Minimum number of samples in each window: 50 [1,1]: Using synchronous calls for inference [1,1]: Stabilizing using average latency [1,1]: [1,0]: Measurement Settings [1,0]: Batch size: 1 [1,0]: Service Kind: Triton [1,0]: Using "count_windows" mode for stabilization [1,0]: Minimum number of samples in each window: 50 [1,0]: Using synchronous calls for inference [1,0]: Stabilizing using average latency [1,0]: [1,0]:Request concurrency: 8 [1,1]:Request concurrency: 16 [1,1]:Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests. [1,1]:Thread [0] had error: Failed to process the request(s) for model instance 'reranker_0_4', mes 16:37:26 [Model Analyzer] DEBUG: Measurement for [0, 0, 0, 0]: None. 16:37:26 [Model Analyzer] Saved checkpoint to /app/snow.atg_arch_only.home/users/ariyaz/ml_repos/model_repositories/checkpoints/0.ckpt 16:37:26 [Model Analyzer] Creating model config: bge_reranker_v2_onnx_config_0 16:37:26 [Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}] 16:37:26 [Model Analyzer] Setting max_batch_size to 1 16:37:26 [Model Analyzer] Enabling dynamic_batching 16:37:26 [Model Analyzer] 16:37:26 [Model Analyzer] Creating model config: reranker_config_0 16:37:26 [Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}] 16:37:26 [Model Analyzer] Setting max_batch_size to 1 16:37:26 [Model Analyzer] Enabling dynamic_batching 16:37:26 [Model Analyzer] 16:37:31 [Model Analyzer] DEBUG: Stopped Triton Server. 16:37:31 [Model Analyzer] DEBUG: Triton Server started. 16:37:34 [Model Analyzer] DEBUG: Model bge_reranker_v2_onnx_config_0 loaded 16:37:47 [Model Analyzer] DEBUG: Model reranker_config_0 loaded 16:37:47 [Model Analyzer] Profiling bge_reranker_v2_onnx_config_0: client batch size=1, concurrency=2 16:37:47 [Model Analyzer] Profiling reranker_config_0: client batch size=1, concurrency=2 16:37:47 [Model Analyzer] 16:37:47 [Model Analyzer] DEBUG: Running ['mpiexec', '--allow-run-as-root', '--tag-output', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'bge_reranker_v2_onnx', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'bge_reranker_v2_onnx-results.csv', '--verbose-csv', '--concurrency-range', '2', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000', ':', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'reranker', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'reranker-results.csv', '--verbose-csv', '--concurrency-range', '2', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000'] 16:37:51 [Model Analyzer] Running perf_analyzer failed with exit status 99: [1,0]: Measurement Settings [1,0]: Batch size: 1 [1,0]: Service Kind: Triton [1,0]: Using "count_windows" mode for stabilization [1,0]: Minimum number of samples in each window: 50 [1,0]: Using synchronous calls for inference [1,0]: Stabilizing using average latency [1,0]: [1,1]: Measurement Settings [1,1]: Batch size: 1 [1,1]: Service Kind: Triton [1,1]: Using "count_windows" mode for stabilization [1,1]: Minimum number of samples in each window: 50 [1,1]: Using synchronous calls for inference [1,1]: Stabilizing using average latency [1,1]: [1,0]:Request concurrency: 2 [1,1]:Request concurrency: 2 [1,1]:Failed to maintain requested inference load. Worker thread(s) failed to generate concurrent requests. [1,1]:Thread [0] had error: [request id: ] Exceeds maximum queue size [1,1]: [1, 16:37:51 [Model Analyzer] No changes made to analyzer data, no checkpoint saved. 16:37:56 [Model Analyzer] DEBUG: Stopped Triton Server. Traceback (most recent call last): File "/opt/app_venv/bin/model-analyzer", line 8, in sys.exit(main()) File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/entrypoint.py", line 278, in main analyzer.profile( File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 124, in profile self._profile_models() File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 233, in _profile_models self._model_manager.run_models(models=models) File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 145, in run_models self._stop_ma_if_no_valid_measurement_threshold_reached() File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 239, in _stop_ma_if_no_valid_measurement_threshold_reached raise TritonModelAnalyzerException( model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The first 2 attempts to acquire measurements have failed. Please examine the Tritonserver/PA error logs to determine what has gone wrong.

ganeshku1 commented 4 months ago

@riyajatar37003 Can you please provide the details of the bug suing our bug report here: https://github.com/triton-inference-server/server/issues/new/choose