Questions about model instances and dynamic batch when setting model concurrency

triton-inference-server / fastertransformer_backend

BSD 3-Clause "New" or "Revised" License

411 stars 133 forks source link

Questions about model instances and dynamic batch when setting model concurrency #112

Closed YJHMITWEB closed 1 year ago

YJHMITWEB commented 1 year ago

Hi, I'd like to know that for example, when enabling model concurrency = 2, does Tritonserver run 2 streams for processing requests? And verses that, if using dynamic batch, is there just 1 stream, and all requests are packed into one batch? And how is model instance related to them?

byshiue commented 1 year ago

You can ask this question in tritonserver repo because it should be determined by tritonserver, but not backend side.

YJHMITWEB commented 1 year ago

Thanks!