feat: use batch-manager instead of gpt-runtime

janhq / cortex.tensorrt-llm

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

https://cortex.jan.ai/docs/cortex-tensorrt-llm

Apache License 2.0

40 stars 2 forks source link

feat: use batch-manager instead of gpt-runtime #51

Closed vansangpfiev closed 2 months ago

vansangpfiev commented 4 months ago

batch-manager APIs are more abstract than gpt-runtime (?) Need to check this and refactor the codebase if it is correct.

vansangpfiev commented 2 months ago

Implemented by: https://github.com/janhq/cortex.tensorrt-llm/pull/71