Open vansangpfiev opened 2 weeks ago
We only install an engine variant for a command/request.
Option 2 can be supported straight forward by adding --cpu
flag.
Option 3 is more complicated.
CLI: need to support all flags check for hardware support: --cpu
, --gpu
, --avx
, ...
The priority be the same as is:
GPU > CPU; AVX512 > AVX2 > AVX > NOAVX
For example, if we want to install cpu avx2, we need to execute command: cortex engines install llama-cpp --cpu --avx2
But cortex engines install llama-cpp --avx2
implicitly means cortex will select gpu or cpu variant.
API: example for a request
{
"cpu": ["noavx", "avx", "avx2", ...], // empty means cortex auto select
"gpu": {
"cuda": ["noavx", "avx", "avx2", "avx512"],
"vulkan": [],
...
}
}
Option 3 can be extended from option 2 implementation. cc: @dan-homebrew @namchuai @louis-jan @hiento09
Problem Statement
cortex
is selecting the best option base on user's hardware information. The priority is : GPU > CPU; AVX512 > AVX2 > AVX > NOAVX User can not switch between binaries.Feature Idea
@hiento09 suggests 3 options for
engines install
: Option 1: no flag, install engine by current priority: GPU > CPU; AVX512 > AVX2 > AVX > NOAVX Option 2: allow forcing CPU or GPU,cortex
still detects CPU instruction, for example add flag--cpu
or--gpu
Option 3: user can select specific binary to installWhat do you think? @dan-homebrew @0xSage @namchuai @louis-jan @nguyenhoangthuan99