Error in streaming mode noting that execute function should return None

kisseternity commented 3 weeks ago

System Info

GPU:H100 With latest v0.9.0 code and image from ngc

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Build llama2 13B engine
Launch triton server in decoupled mode
Use inflight_batcher_llm/client/inflight_batcher_llm_client.py code to test stream mode.

Expected behavior

Normal stream output.

actual behavior

Exception as follows: 0603 11:27:18.631800 29219 pb_stub.cc:751] "Failed to process the request(s) for model 'tensorrt_llm_0_0', message: Python model 'tensorrt_llm_0_0' is using the decoupled mode and the execute function must return None." Received an error from server: Python model 'tensorrt_llm_0_0' is using the decoupled mode and the execute function must return None. Encountered error: Python model 'tensorrt_llm_0_0' is using the decoupled mode and the execute function must return None. Encountered error: Python model 'tensorrt_llm_0_0' is using the decoupled mode and the execute function must return None. Exception ignored in: <function InferenceServerClient.del at 0x7f6e5c42f910> Traceback (most recent call last): File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 257, in del File "/usr/local/lib/python3.10/dist-packages/tritonclient/grpc/_client.py", line 265, in close File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 2250, in close File "/usr/local/lib/python3.10/dist-packages/grpc/_channel.py", line 2231, in _close AttributeError: 'NoneType' object has no attribute 'StatusCode'

additional notes

In previous versions, things are normal.

byshiue commented 3 weeks ago

Please share the detailed reproduced steps.

kisseternity commented 3 weeks ago

It turns out the backend config in the tensorrt_llm file should be set to tensorrtllm instead of the default python. Problem solved and close this issue.

triton-inference-server / tensorrtllm_backend