Open sleepwalker2017 opened 7 months ago
@sleepwalker2017 Hi, did you solve this? I'm facing the same problem as you and have no idea what happened.
I0328 09:48:48.039005 1588 server.cc:677]
+------------------+---------+--------+
| Model | Version | Status |
+------------------+---------+--------+
| ensemble | 1 | READY |
| postprocessing | 1 | READY |
| preprocessing | 1 | READY |
| tensorrt_llm | 1 | READY |
| tensorrt_llm_bls | 1 | READY |
+------------------+---------+--------+
I0328 09:48:49.670322 1588 metrics.cc:877] Collecting metrics for GPU 0: NVIDIA A100-SXM4-80GB
I0328 09:48:49.720217 1588 metrics.cc:770] Collecting CPU metrics
I0328 09:48:49.720351 1588 tritonserver.cc:2508]
+----------------------------------+----------------------------------------------------------------------------------------+
| Option | Value |
+----------------------------------+----------------------------------------------------------------------------------------+
| server_id | triton |
| server_version | 2.43.0 |
| server_extensions | classification sequence model_repository model_repository(unload_dependents) schedule_ |
| | policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data |
| | parameters statistics trace logging |
| model_repository_path[0] | /tensorrtllm_backend/triton_model_repo |
| model_control_mode | MODE_NONE |
| strict_model_config | 1 |
| rate_limit | OFF |
| pinned_memory_pool_byte_size | 268435456 |
| cuda_memory_pool_byte_size{0} | 67108864 |
| min_supported_compute_capability | 6.0 |
| strict_readiness | 1 |
| exit_timeout | 30 |
| cache_enabled | 0 |
+----------------------------------+----------------------------------------------------------------------------------------+
I0328 09:48:49.722078 1588 grpc_server.cc:2519] Started GRPCInferenceService at 0.0.0.0:8001
I0328 09:48:49.722273 1588 http_server.cc:4637] Started HTTPService at 0.0.0.0:8000
I0328 09:48:49.763159 1588 http_server.cc:320] Started Metrics Service at 0.0.0.0:8002
look like there is nothing wrong with the output, but no output with curl
curl -X POST localhost:8000/v2/models/tensorrt_llm_bls/generate -d '{"text_input": "What is machine learning?", "max_tokens": 200, "bad_words": "", "stop_words": ""}'
@byshiue @schetlur-nv If there is any information that I should provide?
The server seems to be ok with the following log.
when I run
curl -X POST localhost:10086/v2/models/ensemble/generate -d '{"text_input": "What is machine learning?", "max_tokens": 20, "bad_words": "", "stop_words": ""}'
, there is no response, why is that?