triton-inference-server / server

The Triton Inference Server provides an optimized cloud and edge inferencing solution.
https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
BSD 3-Clause "New" or "Revised" License
8.31k stars 1.48k forks source link

perf_analyzer failed to test model on Triton server #6495

Open nyanmn opened 1 year ago

nyanmn commented 1 year ago

Description I have the following error when the command perf_analyzer -m densenet_onnx --concurrency-range 1:4 is launched.

error: failed to get model metadata: failed to parse the request JSON buffer: Invalid value. at 1

Triton Information The following dockers are used.

nvcr.io/nvidia/tritonserver                            23.10-py3                           
nvcr.io/nvidia/tritonserver                            23.10-py3-sdk

Are you using the Triton container or did you build it yourself? Container

I can launch Triton server container successfully.

+----------------------------------+------------------------------------------------------------------------------------------------------------------------+
| Option                           | Value                                                                                                                  |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------+
| server_id                        | triton                                                                                                                 |
| server_version                   | 2.39.0                                                                                                                 |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration syste |
|                                  | m_shared_memory cuda_shared_memory binary_tensor_data parameters statistics trace logging                              |
| model_repository_path[0]         | /mnt                                                                                                                   |
| model_control_mode               | MODE_NONE                                                                                                              |
| strict_model_config              | 0                                                                                                                      |
| rate_limit                       | OFF                                                                                                                    |
| pinned_memory_pool_byte_size     | 268435456                                                                                                              |
| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                               |
| cuda_memory_pool_byte_size{1}    | 67108864                                                                                                               |
| cuda_memory_pool_byte_size{2}    | 67108864                                                                                                               |
| cuda_memory_pool_byte_size{3}    | 67108864                                                                                                               |
| min_supported_compute_capability | 6.0                                                                                                                    |
| strict_readiness                 | 1                                                                                                                      |
| exit_timeout                     | 30                                                                                                                     |
| cache_enabled                    | 0                                                                                                                      |
+----------------------------------+------------------------------------------------------------------------------------------------------------------------+

I1030 00:52:09.245083 106 grpc_server.cc:2513] Started GRPCInferenceService at 0.0.0.0:8001
I1030 00:52:09.245466 106 http_server.cc:4497] Started HTTPService at 0.0.0.0:8000
I1030 00:52:09.287584 106 http_server.cc:270] Started Metrics Service at 0.0.0.0:8002

Launched client container and tested the command curl -v 0.0.0.0:8000/v2/health/ready But the error is HTTP/1.1 404 Not Found

root@workstation:/mnt# curl -v 0.0.0.0:8000/v2/health/ready
*   Trying 0.0.0.0:8000...
* Connected to 0.0.0.0 (127.0.0.1) port 8000 (#0)
> GET /v2/health/ready HTTP/1.1
> Host: 0.0.0.0:8000
> User-Agent: curl/7.81.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< *HTTP/1.1 404 Not Found*
< Date: Mon, 30 Oct 2023 01:19:54 GMT
< Content-Length: 9
< Content-Type: text/plain; charset=utf-8
<
* Connection #0 to host 0.0.0.0 left intact
Not found

What is wrong with my test. perf_analyzer output is

root@workstation:/mnt# perf_analyzer -m densenet_onnx --concurrency-range 1:4
error: failed to get model metadata: failed to parse the request JSON buffer: Invalid value. at 1
krishung5 commented 12 months ago

Hi @nyanmn, it seems like you are running perf analyzer in the SDK container. Could you check if the Triton server container exposes the port (i.e. adding -p8000:8000 in the Docker run command), and if the client/sdk container uses the host network (i.e. adding --net=host in the Docker run command).