k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
473 stars 97 forks source link

K2-fsa/sherpa/triton/whisper 目录下 bash launch_server.sh 启动后,curl访问返回值{"error":"Not Found"} #570

Closed taorui-plus closed 2 months ago

taorui-plus commented 2 months ago

这是bash launch_server.sh启动后的全部日志信息,看着没有什么问题

I0409 02:55:34.488607 25157 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x7f8b7c000000' with size 2048000000
I0409 02:55:34.491989 25157 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 4096000000
I0409 02:55:34.498199 25157 model_lifecycle.cc:461] loading: whisper:1
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024022700
I0409 02:55:38.325204 25157 python_be.cc:2362] TRITONBACKEND_ModelInstanceInitialize: whisper_0_0 (CPU device 0)
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024022700
I0409 02:55:44.622592 25157 model_lifecycle.cc:827] successfully loaded 'whisper'
I0409 02:55:44.622715 25157 server.cc:606]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0409 02:55:44.622775 25157 server.cc:633]
+---------+----------------------------------------------------+----------------------------------------------------+
| Backend | Path                                               | Config                                             |
+---------+----------------------------------------------------+----------------------------------------------------+
| python  | /opt/tritonserver/backends/python/libtriton_python | {"cmdline":{"auto-complete-config":"true","backend |
|         | .so                                                | -directory":"/opt/tritonserver/backends","min-comp |
|         |                                                    | ute-capability":"6.000000","default-max-batch-size |
|         |                                                    | ":"4"}}                                            |
|         |                                                    |                                                    |
+---------+----------------------------------------------------+----------------------------------------------------+

I0409 02:55:44.622815 25157 server.cc:676]
+---------+---------+--------+
| Model   | Version | Status |
+---------+---------+--------+
| whisper | 1       | READY  |
+---------+---------+--------+
I0409 02:55:44.672535 25157 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA A10G
I0409 02:55:44.680349 25157 metrics.cc:770] Collecting CPU metrics
I0409 02:55:44.680501 25157 tritonserver.cc:2498]
+----------------------------------+----------------------------------------------------------------------------------+
| Option                           | Value                                                                            |
+----------------------------------+----------------------------------------------------------------------------------+
| server_id                        | triton                                                                           |
| server_version                   | 2.42.0                                                                           |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data parameters statistics trace logging                                  |
| model_repository_path[0]         | ./model_repo_whisper_trtllm                                                      |
| model_control_mode               | MODE_NONE                                                                        |
| strict_model_config              | 0                                                                                |
| rate_limit                       | OFF                                                                              |
| pinned_memory_pool_byte_size     | 2048000000                                                                       |
| cuda_memory_pool_byte_size{0}    | 4096000000                                                                       |
| min_supported_compute_capability | 6.0                                                                              |
| strict_readiness                 | 1                                                                                |
| exit_timeout                     | 30                                                                               |
| cache_enabled                    | 0                                                                                |
+----------------------------------+----------------------------------------------------------------------------------+

I0409 02:55:44.681779 25157 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8001
I0409 02:55:44.682024 25157 http_server.cc:4623] Started HTTPService at 0.0.0.0:10086
I0409 02:55:44.723103 25157 http_server.cc:315] Started Metrics Service at 0.0.0.0:10087

执行curl 0.0.0.0:10086返回{"error":"Not Found"},没有找到其他日志信息,不知道哪个操作出了问题。

yuekaizhang commented 2 months ago

这是bash launch_server.sh启动后的全部日志信息,看着没有什么问题

I0409 02:55:34.488607 25157 pinned_memory_manager.cc:275] Pinned memory pool is created at '0x7f8b7c000000' with size 2048000000
I0409 02:55:34.491989 25157 cuda_memory_manager.cc:107] CUDA memory pool is created on device 0 with size 4096000000
I0409 02:55:34.498199 25157 model_lifecycle.cc:461] loading: whisper:1
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024022700
I0409 02:55:38.325204 25157 python_be.cc:2362] TRITONBACKEND_ModelInstanceInitialize: whisper_0_0 (CPU device 0)
[TensorRT-LLM] TensorRT-LLM version: 0.9.0.dev2024022700
I0409 02:55:44.622592 25157 model_lifecycle.cc:827] successfully loaded 'whisper'
I0409 02:55:44.622715 25157 server.cc:606]
+------------------+------+
| Repository Agent | Path |
+------------------+------+
+------------------+------+

I0409 02:55:44.622775 25157 server.cc:633]
+---------+----------------------------------------------------+----------------------------------------------------+
| Backend | Path                                               | Config                                             |
+---------+----------------------------------------------------+----------------------------------------------------+
| python  | /opt/tritonserver/backends/python/libtriton_python | {"cmdline":{"auto-complete-config":"true","backend |
|         | .so                                                | -directory":"/opt/tritonserver/backends","min-comp |
|         |                                                    | ute-capability":"6.000000","default-max-batch-size |
|         |                                                    | ":"4"}}                                            |
|         |                                                    |                                                    |
+---------+----------------------------------------------------+----------------------------------------------------+

I0409 02:55:44.622815 25157 server.cc:676]
+---------+---------+--------+
| Model   | Version | Status |
+---------+---------+--------+
| whisper | 1       | READY  |
+---------+---------+--------+
I0409 02:55:44.672535 25157 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA A10G
I0409 02:55:44.680349 25157 metrics.cc:770] Collecting CPU metrics
I0409 02:55:44.680501 25157 tritonserver.cc:2498]
+----------------------------------+----------------------------------------------------------------------------------+
| Option                           | Value                                                                            |
+----------------------------------+----------------------------------------------------------------------------------+
| server_id                        | triton                                                                           |
| server_version                   | 2.42.0                                                                           |
| server_extensions                | classification sequence model_repository model_repository(unload_dependents) sch |
|                                  | edule_policy model_configuration system_shared_memory cuda_shared_memory binary_ |
|                                  | tensor_data parameters statistics trace logging                                  |
| model_repository_path[0]         | ./model_repo_whisper_trtllm                                                      |
| model_control_mode               | MODE_NONE                                                                        |
| strict_model_config              | 0                                                                                |
| rate_limit                       | OFF                                                                              |
| pinned_memory_pool_byte_size     | 2048000000                                                                       |
| cuda_memory_pool_byte_size{0}    | 4096000000                                                                       |
| min_supported_compute_capability | 6.0                                                                              |
| strict_readiness                 | 1                                                                                |
| exit_timeout                     | 30                                                                               |
| cache_enabled                    | 0                                                                                |
+----------------------------------+----------------------------------------------------------------------------------+

I0409 02:55:44.681779 25157 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8001
I0409 02:55:44.682024 25157 http_server.cc:4623] Started HTTPService at 0.0.0.0:10086
I0409 02:55:44.723103 25157 http_server.cc:315] Started Metrics Service at 0.0.0.0:10087

执行curl 0.0.0.0:10086返回{"error":"Not Found"},没有找到其他日志信息,不知道哪个操作出了问题。

@taorui-plus curl 的用法可能不是这样的,服务启动以后你要不要试试根据 readme 里说明用 client.py 试试。client.py 用的是 tritonclient 里的 grpc。 http 和 curl 直接请求都是可以实现的,只是现在没有支持。

see https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/getting_started/quickstart.html#verify-triton-is-running-correctly

taorui-plus commented 2 months ago

问题解决了,感谢大佬的解答