Closed namchuai closed 2 weeks ago
On my perspective, we should download CUDA toolkit separately. We support multiple engines: cortex.llamacpp and cortex.tensorrt-llm, both need CUDA toolkit to run. CUDA is backward compatible so we only need the latest CUDA toolkit version that supported by nvidia-driver version. For example:
Edit: I just checked the cuda matrix compatibility and it is incorrect that CUDA is always backward compatible
Related ticket: https://github.com/janhq/cortex/issues/1047
Edit 2: The above image is forward compatibility between cuda and nvidia-version
From CUDA 11 onwards, applications compiled with a CUDA Toolkit release from within a CUDA major release family can run
So yes, CUDA is backward compatible within a CUDA major release reference: https://docs.nvidia.com/deploy/cuda-compatibility/#minor-version-compatibility
I also think we need to Download CUDA toolkit separately, both tensorrt llm and llamacpp require cuda, plus: inside tensorrt llm package (~ 1GB) doesn't include cuda toolkit lib (cublas, cuparse, ... which is very heavy ~400Mb), if we decided to pack everything in 1 package for both tensorrt-llm and llamacpp, the size will increase.
I'm referring this table to check for the compatibility between driver and toolkit https://docs.nvidia.com/deeplearning/cudnn/latest/reference/support-matrix.html#gpu-cuda-toolkit-and-cuda-driver-requirements
Can I verify my understanding of the issue:
Decision For Nvidia GPU users, the different engines have CUDA dependencies that are large 200-400mb downloads.
My initial thoughts
cudart
files that have been verifiedThis will be disk-space inefficient. However, the alternative seems to be dependency hell, which I think is even worse.
Folder Structure
cortex.llama.cpp
a separate module; it should ideally be independent and package with its dependencies/cortex
/engines
/llama.cpp-extension
/deps # CUDA dlls
/tensorrt-llm-extension
/deps # CUDA dlls
That said, am open to all ideas, especially @vansangpfiev's
If disk-space inefficient is acceptable, I think we can go with option 1. Please note that we will have some blockers for this option:
dynamic library search path: we will have two paths for llamacpp and tensorrt-llm, a potential issue can happen when we mix them together. For ubuntu and MacOS, I think we can solve that issue by compiling with rpath flag. For windows, I have created related issue
CI changes for cortex.llamacpp and cortex.tensorrt-llm to pack CUDA dependencies
Thanks @vansangpfiev and @dan-homebrew
I'm confirming that we agree with: Question 1: Packaging CUDA toolkit dependencies into corresponding engine. Caveats:
Question 2: Storing CUDA dependencies under corresponding engines.
/cortex
/engines
/cortex.llamacpp
/deps # CUDA dlls
/cortex.tensorrt-llm
/deps # CUDA dlls
Caveats:
Additional thought @vansangpfiev , I think when we change the CI for engine, could we associated a file which contains the versions of the engine and info of its dependencies. This will help for engine list command in the future. wdyt? cc @nguyenhoangthuan99
What if llamacpp vs tensorrtllm dependencies start to conflict?
Do we care about engine portability. And does doing a dynamic library search path on windows
affect portability.
How will we do maintenance and updates? i.e.
Is this a dumb idea: store CUDA dependencies in a central location, such as a separate deps directory at the project root, and then use symbolic links or environment variables to point to the engine-specific dependencies.
/.cortex
/deps
/cuda
cuda-11.5 or whatever versioning
/engines
/cortex.llamacpp
/bin
/cortex.tensorrt-llm
/bin
@0xSage , here's my thought. Please correct me if I'm wrong @nguyenhoangthuan99 @vansangpfiev
cortex.llamacpp
and cortex.tensorrt-llm
/engines
/cortex.llamacpp
/deps
/cortex.tensorrt-llm
/deps
dynamic library search path
, it's because, in windows, the program will search for the DLLs
at the current path (IIRC, same path as the executable). But we are about to put the dependencies under cortex.llamacpp/deps
and cortex.tenssorrt-llm/deps
, we need to tell the OS where to look for the DLLs
.DLLs
might cost us some efforts to handle it properly? I'm not sure. WDYT? @vansangpfiev @dan-homebrew @nguyenhoangthuan99For 3, I think we can do the maintenance and updates by versioning: generate a file (for example version.txt) for each release, which has metadata for engine version and cuda version. We will update cuda dependencies if needed. For 4, I think it is easier for us to locate all cuda dependencies in the same folder as engine because we don't need to check which cuda version is using for which engine version
@vansangpfiev @namchuai @0xSage Quick responses:
Is this a dumb idea: store CUDA dependencies in a central location, such as a separate deps directory at the project root, and then use symbolic links or environment variables to point to the engine-specific dependencies.
I also agree with @vansangpfiev: let's co-locate all CUDA dependencies with the engine folder.
Simple > Complex, especially since model files are >4gb.
For 3, I think we can do the maintenance and updates by versioning: generate a file (for example version.txt) for each release, which has metadata for engine version and cuda version. We will update cuda dependencies if needed.
I also think we need to think through the CLI and API commands:
cortex engines update tensorrt-llm
PUT <API URL>?
I wonder whether it is better for us to have clearer naming for Cortex engines:
llamacpp-engine
onnx-engine
tensorrt-llm-engine
This articulates the concept of Cortex engines more clearly. Hopefully, with a clear API, the community can also step in to help build backends.
We would need to reason through cortex.python
separately.
cortex.python
might be more of a "Python base template", where Engine Extension dev can define Python version to bundle
Motivation
Do we package the cuda toolkit to the engine? Yes? Then will have to do the same for
llamacpp
,tensorrt-llm
andonnx
? No? Will download separatedlyFolder structures (e.g if user have llamacpp, tensorrt at the same time)?
Resources Llamacpp release Currently we are downloading toolkit dependency via
https://catalog.jan.ai/dist/cuda-dependencies/<version>/<platform>/cuda.tar.gz
cc @vansangpfiev @nguyenhoangthuan99 @dan-homebrew
Update sub-tasks: