Open endingback opened 1 month ago
How to support streaming text return when inputting an image into a multimodal large model. The algorithm already supports streaming, how does Triton Server support streaming return
Hello! May be vllm_backend can help you to understand stream conception https://github.com/triton-inference-server/vllm_backend/blob/main/src/model.py
How to support streaming text return when inputting an image into a multimodal large model. The algorithm already supports streaming, how does Triton Server support streaming return