Investigate support for streaming mode

Previously reported in https://github.com/speechmatics/ctranslate2_triton_backend/issues/2#issuecomment-1546889761 by @aamir-s18

Streaming mode could be useful for very big models. It can help in real-time use cases where we can improve User Experience by generating one token at a time.

Triton Server supports streaming with decoupled models

It needs to be investigated how CTranslate can be used to get decoded tokens one-by-one. Additionally this might be trickier in a beam decode setting, unless we are willing to always return the best guess which could flip previous words

speechmatics / ctranslate2_triton_backend

Investigate support for streaming mode #6