triton-inference-server / tensorrtllm_backend

The Triton TensorRT-LLM Backend
Apache License 2.0
581 stars 81 forks source link

Error in streaming mode noting that execute function should return None #488

Closed kisseternity closed 3 weeks ago

kisseternity commented 3 weeks ago

System Info

GPU:H100 With latest v0.9.0 code and image from ngc

Who can help?

No response

Information

Tasks

Reproduction

  1. Build llama2 13B engine
  2. Launch triton server in decoupled mode
  3. Use inflight_batcher_llm/client/inflight_batcher_llm_client.py code to test stream mode.

Expected behavior

Normal stream output.

actual behavior

Exception as follows: 0603 11:27:18.631800 29219 pb_stub.cc:751] "Failed to process the request(s) for model 'tensorrt_llm_0_0', message: Python model 'tensorrt_llm_0_0' is using the decoupled mode and the execute function must return None." Received an error from server: Python model 'tensorrt_llm_0_0' is using the decoupled mode and the execute function must return None. Encountered error: Python model 'tensorrt_llm_0_0' is using the decoupled mode and the execute function must return None. Encountered error: Python model 'tensorrt_llm_0_0' is using the decoupled mode and the execute function must return None. Exception ignored in: <function InferenceServerClient.del at 0x7f6e5c42f910> Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 257, in del File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 265, in close File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 2250, in close File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 2231, in _close AttributeError: 'NoneType' object has no attribute 'StatusCode'

additional notes

In previous versions, things are normal.

byshiue commented 3 weeks ago

Please share the detailed reproduced steps.

kisseternity commented 3 weeks ago

It turns out the backend config in the tensorrt_llm file should be set to tensorrtllm instead of the default python. Problem solved and close this issue.