feat: TensorRT-LLM load multiple models

janhq / cortex.tensorrt-llm

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.

https://cortex.jan.ai/docs/cortex-tensorrt-llm

Apache License 2.0

37 stars 2 forks source link

feat: TensorRT-LLM load multiple models #33

Open tikikun opened 6 months ago

tikikun commented 6 months ago

Currently can only run one model at a time

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."