Open tikikun opened 7 months ago
Relevant docs can be found here.
https://nvidia.github.io/TensorRT-LLM/batch_manager.html#get-and-send-callbacks
Inflight batching is the most beneficial feature in CUDA system for LLM inferencing right now it can enable very high throughput.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
Relevant docs can be found here.
https://nvidia.github.io/TensorRT-LLM/batch_manager.html#get-and-send-callbacks
Inflight batching is the most beneficial feature in CUDA system for LLM inferencing right now it can enable very high throughput.