Open riyajatar37003 opened 4 months ago
16:36:34 [Model Analyzer] DEBUG:
{'always_report_gpu_metrics': False,
'batch_sizes': [1],
'bls_composing_models': [],
'checkpoint_directory': '/app/snow.atg_arch_only.home/users/ariyaz/ml_repos/model_repositories/checkpoints',
'client_max_retries': 50,
'client_protocol': 'grpc',
'collect_cpu_metrics': False,
'concurrency': [],
'config_file': 'config.yaml',
'constraints': {},
'cpu_only_composing_models': [],
'duration_seconds': 3,
'early_exit_enable': False,
'export_path': './profile_results_reranker1',
'filename_model_gpu': 'metrics-model-gpu.csv',
'filename_model_inference': 'metrics-model-inference.csv',
'filename_server_only': 'metrics-server-only.csv',
'genai_perf_flags': {},
'gpu_output_fields': ['model_name',
'gpu_uuid',
'batch_size',
'concurrency',
'model_config_path',
'instance_group',
'satisfies_constraints',
'gpu_used_memory',
'gpu_utilization',
'gpu_power_usage'],
'gpus': ['all'],
'inference_output_fields': ['model_name',
'batch_size',
'concurrency',
'model_config_path',
'instance_group',
'max_batch_size',
'satisfies_constraints',
'perf_throughput',
'perf_latency_p99'],
'latency_budget': None,
'min_throughput': None,
'model_repository': '/app/snow.atg_arch_only.home/users/ariyaz/ml_repos/model_repositories/reranker',
'model_type': 'generic',
'monitoring_interval': 1.0,
'num_configs_per_model': 2,
'num_top_model_configs': 0,
'objectives': {'perf_throughput': 10},
'output_model_repository_path': './rerenker_output1',
'override_output_model_repository': True,
'perf_analyzer_cpu_util': 5120.0,
'perf_analyzer_flags': {},
'perf_analyzer_max_auto_adjusts': 10,
'perf_analyzer_path': 'perf_analyzer',
'perf_analyzer_timeout': 600,
'perf_output': False,
'perf_output_path': None,
'plots': [{'name': 'throughput_v_latency', 'title': 'Throughput vs. Latency', 'x_axis': 'perf_latency_p99', 'y_axis': 'perf_throughput', 'monotonic': True},
{'name': 'gpu_mem_v_latency', 'title': 'GPU Memory vs. Latency', 'x_axis': 'perf_latency_p99', 'y_axis': 'gpu_used_memory', 'monotonic': False}],
'profile_models': [{'model_name': 'bge_reranker_v2_onnx', 'cpu_only': False, 'objectives': {'perf_throughput': 10}, 'parameters': {'batch_sizes': [1], 'concurrency': [], 'request_rate': []}, 'weighting': 1},
{'model_name': 'reranker', 'cpu_only': False, 'objectives': {'perf_throughput': 10}, 'parameters': {'batch_sizes': [1], 'concurrency': [], 'request_rate': []}, 'weighting': 1}],
'reload_model_disable': False,
'request_rate': [],
'request_rate_search_enable': False,
'run_config_profile_models_concurrently_enable': True,
'run_config_search_disable': False,
'run_config_search_max_binary_search_steps': 5,
'run_config_search_max_concurrency': 2,
'run_config_search_max_instance_count': 4,
'run_config_search_max_model_batch_size': 4,
'run_config_search_max_request_rate': 8192,
'run_config_search_min_concurrency': 1,
'run_config_search_min_instance_count': 1,
'run_config_search_min_model_batch_size': 1,
'run_config_search_min_request_rate': 16,
'run_config_search_mode': 'quick',
'server_output_fields': ['model_name',
'gpu_uuid',
'gpu_used_memory',
'gpu_utilization',
'gpu_power_usage'],
'skip_detailed_reports': False,
'skip_summary_reports': False,
'triton_docker_args': {},
'triton_docker_image': 'nvcr.io/nvidia/tritonserver:24.04-py3',
'triton_docker_labels': {},
'triton_docker_mounts': [],
'triton_docker_shm_size': None,
'triton_grpc_endpoint': 'localhost:8001',
'triton_http_endpoint': 'localhost:8000',
'triton_install_path': '/opt/tritonserver',
'triton_launch_mode': 'local',
'triton_metrics_url': 'http://localhost:8002/metrics',
'triton_output_path': None,
'triton_server_environment': {},
'triton_server_flags': {},
'triton_server_path': 'tritonserver',
'weighting': None}
16:36:34 [Model Analyzer] Initializing GPUDevice handles
16:36:35 [Model Analyzer] Using GPU 0 Tesla V100-SXM2-32GB with UUID GPU-c898354c-1e75-3b40-3c84-2a272ee206c2
16:36:36 [Model Analyzer] WARNING: Overriding the output model repo path "./rerenker_output1"
16:36:36 [Model Analyzer] Starting a local Triton Server
16:36:36 [Model Analyzer] No checkpoint file found, starting a fresh run.
16:36:36 [Model Analyzer] Profiling server only metrics...
16:36:36 [Model Analyzer] DEBUG: Triton Server started.
16:36:46 [Model Analyzer] DEBUG: Stopped Triton Server.
16:36:46 [Model Analyzer]
16:36:46 [Model Analyzer] Starting quick mode search to find optimal configs
16:36:46 [Model Analyzer]
16:36:46 [Model Analyzer] Creating model config: bge_reranker_v2_onnx_config_default
16:36:46 [Model Analyzer]
16:36:46 [Model Analyzer] Creating model config: reranker_config_default
16:36:46 [Model Analyzer]
16:36:58 [Model Analyzer] DEBUG: Triton Server started.
16:37:07 [Model Analyzer] DEBUG: Model bge_reranker_v2_onnx_config_default loaded
16:37:22 [Model Analyzer] DEBUG: Model reranker_config_default loaded
16:37:22 [Model Analyzer] Profiling bge_reranker_v2_onnx_config_default: client batch size=1, concurrency=8
16:37:22 [Model Analyzer] Profiling reranker_config_default: client batch size=1, concurrency=16
16:37:22 [Model Analyzer]
16:37:22 [Model Analyzer] DEBUG: Running ['mpiexec', '--allow-run-as-root', '--tag-output', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'bge_reranker_v2_onnx', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'bge_reranker_v2_onnx-results.csv', '--verbose-csv', '--concurrency-range', '8', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000', ':', '-n', '1', 'perf_analyzer', '--enable-mpi', '-m', 'reranker', '-b', '1', '-u', 'localhost:8001', '-i', 'grpc', '-f', 'reranker-results.csv', '--verbose-csv', '--concurrency-range', '16', '--measurement-mode', 'count_windows', '--collect-metrics', '--metrics-url', 'http://localhost:8002/metrics', '--metrics-interval', '1000']
16:37:26 [Model Analyzer] Running perf_analyzer failed with exit status 99:
[1,1]
@riyajatar37003 Can you please provide the details of the bug suing our bug report here: https://github.com/triton-inference-server/server/issues/new/choose
model-analyzer profile --run-config-profile-models-concurrently-enable --override-output-model-repository --model-repository model_repositories --profile-models model1\,model2 --output-model-repository-path ./model1_op --export-path model1_report
after running above command i am getting following error:
Model Analyzer] Initializing GPUDevice handles
[Model Analyzer] Using GPU 0 NVIDIA A100-SXM4-40GB with UUID GPU-d9a0447f-f8fa-9d2f-79fc-ecf2567dacc2
[Model Analyzer] WARNING: Overriding the output model repo path "./m_output1"
[Model Analyzer] Starting a local Triton Server
[Model Analyzer] Loaded checkpoint from file /model_repositories/checkpoints/0.ckpt
[Model Analyzer] GPU devices match checkpoint - skipping server metric acquisition
[Model Analyzer]
[Model Analyzer] Starting quick mode search to find optimal configs
[Model Analyzer]
[Model Analyzer] Creating model config: m_config_default
[Model Analyzer]
[Model Analyzer] Creating model config: m_v2_onnx_config_default
[Model Analyzer]
[Model Analyzer] Profiling m_config_default: client batch size=1, concurrency=24
[Model Analyzer] Profiling m_v2_onnx_config_default: client batch size=1, concurrency=8
[Model Analyzer]
[Model Analyzer] perf_analyzer took very long to exit, killing perf_analyzer
[Model Analyzer] perf_analyzer did not produce any output.
[Model Analyzer] Saved checkpoint to model_repositories/checkpoints/1.ckpt
[Model Analyzer] Creating model config: m_config_0
[Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]
[Model Analyzer] Setting max_batch_size to 1
[Model Analyzer] Enabling dynamic_batching
[Model Analyzer]
[Model Analyzer] Creating model config: m_v2_onnx_config_0
[Model Analyzer] Setting instance_group to [{'count': 1, 'kind': 'KIND_GPU'}]
[Model Analyzer] Setting max_batch_size to 1
[Model Analyzer] Enabling dynamic_batching
[Model Analyzer]
[Model Analyzer] Profiling m_config_0: client batch size=1, concurrency=2
[Model Analyzer] Profiling m_v2_onnx_config_0: client batch size=1, concurrency=2
[Model Analyzer]
[Model Analyzer] perf_analyzer took very long to exit, killing perf_analyzer
[Model Analyzer] perf_analyzer did not produce any output.
[Model Analyzer] No changes made to analyzer data, no checkpoint saved.
Traceback (most recent call last):
File "/opt/app_venv/bin/model-analyzer", line 8, in
sys.exit(main())
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/entrypoint.py", line 278, in main
analyzer.profile(
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 124, in profile
self._profile_models()
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/analyzer.py", line 233, in _profile_models
self._model_manager.run_models(models=models)
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 145, in run_models
self._stop_ma_if_no_valid_measurement_threshold_reached()
File "/opt/app_venv/lib/python3.10/site-packages/model_analyzer/model_manager.py", line 239, in _stop_ma_if_no_valid_measurement_threshold_reached
raise TritonModelAnalyzerException(
model_analyzer.model_analyzer_exceptions.TritonModelAnalyzerException: The first 2 attempts to acquire measurements have failed. Please examine the Tritonserver/PA error logs to determine what has gone wrong.