janhq / cortex.cpp

Run and customize Local LLMs.
https://cortex.so
Apache License 2.0
1.91k stars 105 forks source link

epic: Cortex Hardware API #1165

Open dan-homebrew opened 1 week ago

dan-homebrew commented 1 week ago

Goal

Tasklist

Context

Cortex.cpp's Hardware API should enable us to do this in Jan

Image

dan-homebrew commented 1 week ago

@louis-jan I'm assigning this to you in Sprint 20, as this has a significant CLI and API design component.

EDIT: adding @nguyenhoangthuan99 for implementation

dan-homebrew commented 3 days ago

@louis-jan @nguyenhoangthuan99 I am going to move this to Sprint 21, as I think you guys should land the Model Folder and model.yaml first.

nguyenhoangthuan99 commented 1 day ago

The hardware detection serves two main purposes:

To achieve these goals and to make debugging easier, as well as to help users choose the appropriate model, the hardware API/CLI should provide the following information:

example return body:

{
  "os": "windows",
  "arch": "amd64",
  "suitable_avx": "avx2",
  "free_memory": 8192,
  "gpu_info": [
    {
      "id": "0",
      "name": "NVIDIA GeForce RTX 3090",
      "arch": "ampere",
      "driver_version": "552.12",
      "cuda_driver_version": "12.4",
      "compute_cap": "8.6",
      "free_vram": 8192
    }
  ]
}

Note: The commenter mentions that getting the free VRAM information from C++ is challenging and requires further investigation (current approach is parse from output of nvidia-smi command). This information would allow the system to make informed decisions about which engine version to install and which models can run efficiently on the user's hardware. It also provides valuable data for debugging purposes. cc @louis-jan for recommendation from Jan app for easier integration

louis-jan commented 1 day ago

From Jan, we expect to just have some sort of information to select a corresponding engine versions / Settings, such as CPU Instructions / GPUs

But we need to gather comprehensive hardware information for debugging, including CPU, GPU, RAM, OS, and connected monitors (as issues like projector connections have been known to impact performance).

Structure

To make it easier for user support, the hardware information should be grouped for quick lookup (users support), a mix of flattened and grouped structures can be visually overwhelming.

E.g. The supporter have to scroll to the bottom of the file to see os

✅✅
```json { "arch": "", "free_memory": "", "gpus": [ {}, {}, {} ], "os":"" } ``` ```json { "device": { "arch": "", "free_memory": "", "os":"" } "gpus": [ {}, {}, {} ], } ``` ```json { "cpu": { "arch": "x64", "cores": "4", "model": "Intel Core i9 12900K", "instructions": [ "AVX512", "FMA", "SSE" ] }, "os": { "version": "10.2", "name": "Windows 10 Pro" }, "power": { "battery_life": 80, "charging_status": "charged", "is_power_saving": false }, "ram": { "total": "16", "available": "12", "type": "DDR4" // better model name? }, "storage": { "total": 512, "available": 256, "type": "SSD" // better model name? }, "gpus": [ {}, {}, {} ], "monitors": [ ] } ```

Consistent from system to system

Different devices but the same output format, such as GPU Driver. Should not have different response body structure per GPU family.

E.g.

```json "graphics": [ { "id": "0", "name": "NVIDIA GeForce RTX 3090", "driver_version": "552.12", "cuda_driver_version": "12.4", "compute_cap": "8.6", "free_vram": 8192 }, { "id": "1", "name": "AMD Radeon RX 6800 XT", "driver_version": "5.0.2?", "cuda_driver_version": "?", "compute_cap": "?", "free_vram": 8192 }, ] ``` ```json "graphics": [ { "id": "0", "name": "NVIDIA GeForce RTX 3090", "version": "12.4", "additional_information": { "driver_version": "552.12", "compute_cap": "8.6" }, "free_vram": 8192, "total_vram": 8192 }, { "id": "1", "name": "AMD Radeon RX 6800 XT", "version": "6.1", "free_vram": 8192, "total_vram": 8192 "additional_information": { "rocm_git_revision": "0d0a7a10c1a3" }, }, ] ```

Try to gather anything could affect the performance?

Request

It would be beneficial to have filter query support, allowing clients to only poll for the data they need. E.g. ?filters=gpu,cpu

@nguyenhoangthuan99 @dan-homebrew