janhq / cortex.tensorrt-llm

Cortex.Tensorrt-LLM is a C++ inference library that can be loaded by any server at runtime. It submodules NVIDIA’s TensorRT-LLM for GPU accelerated inference on NVIDIA's GPUs.
https://cortex.jan.ai/docs/cortex-tensorrt-llm
Apache License 2.0
37 stars 2 forks source link

feat: tiktoken integration #60

Closed nguyenhoangthuan99 closed 2 months ago

nguyenhoangthuan99 commented 2 months ago

Issue: https://github.com/janhq/cortex.tensorrt-llm/issues/49