janhq / cortex.cpp

Local AI API Platform
https://cortex.so
Apache License 2.0
2.13k stars 125 forks source link

epic: Implement Cortex Hardware API for Nvidia #1568

Closed vansangpfiev closed 2 days ago

vansangpfiev commented 3 weeks ago

Implementation for https://github.com/janhq/cortex.cpp/issues/1165

Tasklist

(Will fill in details when implement each task)

Hardware API

/engines

/model/start

Jan

Bugs to Address

Related bugs:

Out-of-scope

vansangpfiev commented 3 weeks ago

Hardware API Documentation

Get hardware information

GET /v1/hardware

Response:

{
  "cpu": {
    "arch": "string",
    "cores": number,
    "model": "string",
    "instructions": ["string"]
  },
  "os": {
    "version": "string",
    "name": "string"
  },
  "ram": {
    "total": number,
    "available": number,
    "type": "string"
  },
  "storage": {
    "total": number,
    "available": number,
    "type": "string"
  },
  "gpus": [
    {
      "model": "string",
      "vram": "string",
      "driver_version": "string"
    }
  ],
  "power": {
    "battery_life": number,
    "charging_status": "string",
    "is_power_saving": boolean
  },
  "monitors": [
    {
      "resolution": "string",
      "refresh_rate": number,
      "resolution":"string"
    }
  ]
}

Hardware Activation

POST /v1/hardware/activate
{
"gpus": [0, 1]
}
dan-homebrew commented 3 weeks ago

Thanks @vansangpfiev. Will we be implementing deactivate this sprint?

vansangpfiev commented 3 weeks ago

Thanks @vansangpfiev. Will we be implementing deactivate this sprint?

Since we have /activate endpoint, I think it is redundant to add /deactivate. By default, we activate all the GPUs. We deactivate all GPUs that are not in request for /activate.

dan-homebrew commented 2 weeks ago

A few notes from our quick call:

Hardware Support

We will need to work with multiple hardware providers, but these can be dealt with in separate sprints:

ngl settings

vansangpfiev commented 1 week ago

CLI Documentation:

Get hardware information

cortex hardware list --cpu --os --ram --storage --gpu --power --monitors

If no flag is specified, display all hardware information

Activate hardware

cortex hardware activate --gpus [gpu_list]

gpu_list is required, [] means deactivate all GPUs

Start model

cortex start [model_id] --gpus [gpu_list]

--gpus is optional, if not specified use all activated GPUs

Run

cortex run [model_id] --gpus [gpu_list]

--gpus is optional, if not specified use all activated GPUs

gabrielle-ong commented 1 week ago

Nicely done @vansangpfiev! Testing it out now - 2 quick questions:

  1. I cant seem to deactivate the GPU to test without GPU -
    cortex-nightly hardware activate --gpus []
    Invalid GPU index provided.
  2. GPU information has Index=1, ID=0 for the same GPU, which is confusing - can we standardize to using Index like the other fields? image
vansangpfiev commented 1 week ago

Nicely done @vansangpfiev! Testing it out now - 2 quick questions:

  1. I cant seem to deactivate the GPU to test without GPU -
cortex-nightly hardware activate --gpus []
Invalid GPU index provided.
  1. GPU information has Index=1, ID=0 for the same GPU, which is confusing - can we standardize to using Index like the other fields? image

Thanks @gabrielle-ong

  1. Let me take a look. Would you mind sharing the cortex.log and cortex-cli.log?
  2. Sure, let me fix it. Actually, the ID is the GPU ID that nvidia-smi reports, it can be different from #index.
gabrielle-ong commented 1 week ago

Thanks Sang! 2 - I see, understand. then it'll help to make it clear its the nvidia-smi ID through the help command

1- it just takes in the empty array, no error logs. cortex-cli.log

20241114 06:32:23.404000 UTC 13784 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 06:32:23.404000 UTC 18228 INFO  Will check for new update, time from last check: 2531 seconds - cortex_upd_cmd.cc:127
20241114 06:32:23.404000 UTC 18228 INFO  Engine release path: https://delta.jan.ai/cortex/latest/version.json - cortex_upd_cmd.cc:138
20241114 06:32:23.545000 UTC 18228 INFO  Got the latest release, update to the config file: v1.0.2-235 - cortex_upd_cmd.cc:175

cortex.log:

20241114 05:38:18.970000 UTC 3728 INFO  Origin:  - main.cc:160
20241114 05:38:19.139000 UTC 12684 INFO  Gpu Driver Version: 551.76 - utils/system_info_utils.h:116
20241114 05:38:19.279000 UTC 12684 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 05:38:19.484000 UTC 12684 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 05:38:19.531000 UTC 12684 INFO  Origin:  - main.cc:160
20241114 05:49:51.989000 UTC 7484 INFO  Origin:  - main.cc:160
20241114 05:49:51.989000 UTC 16792 INFO  activate: {
    "gpus" : 
    [
        0
    ]
}
 - hardware.cc:38
20241114 05:49:51.989000 UTC 16792 INFO  No hardware activation changes -> No need to update - hardware_service.cc:211
20241114 05:49:51.989000 UTC 16792 INFO  Origin:  - main.cc:160
20241114 05:50:00.401000 UTC 1384 INFO  Origin:  - main.cc:160
20241114 05:50:00.542000 UTC 4276 INFO  Gpu Driver Version: 551.76 - utils/system_info_utils.h:116
20241114 05:50:00.682000 UTC 4276 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 05:50:00.870000 UTC 4276 INFO  CUDA Version: 12.4 - utils/system_info_utils.h:141
20241114 05:50:00.964000 UTC 4276 INFO  Origin:  - main.cc:160
20241114 05:50:12.567000 UTC 17776 INFO  Origin:  - main.cc:160
20241114 05:50:12.567000 UTC 16092 INFO  activate: {
    "gpus" : 
    [
        1
    ]
}
 - hardware.cc:38
20241114 05:50:12.567000 UTC 16092 INFO  Origin:  - main.cc:160
20241114 05:50:37.058000 UTC 15972 INFO  Origin:  - main.cc:160
20241114 05:50:37.058000 UTC 11104 INFO  activate: {
    "gpus" : []
}
 - hardware.cc:38
20241114 05:50:37.058000 UTC 11104 INFO  Origin:  - main.cc:160
20241114 06:32:23.404000 UTC 11156 INFO  Origin:  - main.cc:160
20241114 06:32:23.404000 UTC 6656 INFO  activate: {
    "gpus" : []
}
 - hardware.cc:38
20241114 06:32:23.404000 UTC 6656 INFO  Origin:  - main.cc:160
vansangpfiev commented 1 week ago

@gabrielle-ong Can you please try again with nightly 236?

gabrielle-ong commented 2 days ago

Thanks Sang! Successfully activate and deactivated GPUs with CLI and API, marking as complete

Using GPU

Image

Using CPU

Image