chore: Improve package/binary size by remove jinja2

nguyenhoangthuan99 commented 2 months ago

The jinja2 make the binary double size because contains too many deps from boost. Because jinja2 only parse model from gguf model. We can remove this part and pass to cortex.llamacpp to handle to reduce size

0xSage commented 2 months ago

@nguyenhoangthuan99 can you elaborate on this issue? Are you talking about only including jinja2 in cortex.llamacpp, instead of the overall cortexcpp packager or soemthing else?

nguyenhoangthuan99 commented 2 months ago

Problem

cortex-cpp using Jinja2-cpp lib to parse chat format from GGUF file. This will help us to run model from every source.
The jinja2-cpp is the only cpp lib that can support render jinja2 template that can build multi platform, but it has many deps from boosts.
Llama.cpp also support parse these jinja2-template to chat format internally -> if we want to use that feature we have to build llamacpp along with cortex-cpp -> this is also not recommended because the llamacpp repo is too large.

Solution

All model with gguf file format only run with cortex.llamacpp engines, for that reason, we will move the part parse chat template for cortex.llamacpp engines. And this part will be executed during runtime (when user start a model using cortex.llamacpp engine, it will parse chat template).
This solution require more effort and can save 60 Mb of binary file.

gabrielle-ong commented 1 month ago

closing issue, thanks @nguyenhoangthuan99

janhq / cortex.cpp

chore: Improve package/binary size by remove jinja2 #1063