triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License
411 stars 133 forks source link

Can I enable streaming on an ensemble model? #155

Open flexwang opened 1 year ago

flexwang commented 1 year ago

In the ensemble model example for gpt, can I change the fastertransformer model to a decoupled model and enable streaming on the client side?

jjjjohnson commented 1 year ago

+1

flexwang2 commented 1 year ago

Answer is yes

jjjjohnson commented 1 year ago

Looks like only FT backend support stream, however python backend does not.