janhq / cortex.cpp

Run and customize Local LLMs.
https://cortex.so
Apache License 2.0
1.91k stars 105 forks source link

Discussion: Cortex.cpp Hardware Detection, Selection, and Memory Management #1089

Closed dan-homebrew closed 1 week ago

dan-homebrew commented 1 week ago

Overview

Note: We will probably need to break this discussion down into smaller topics:

Related

namchuai commented 1 week ago

I have some additional information regarding this subject. Please add more if you have any idea. @vansangpfiev @nguyenhoangthuan99

  1. How do we detect user's hardware?

  2. How do we detect GPUs? (Nvidia, AMD, etc)?

    • [x] We have to dump data from nvidia-smi (an executable from nvidia. which comes along with nvidia driver, IIRC)
    • [ ] For AMD GPU, we will dump data from vulkaninfoSDK. This executable is provided on the internet. We need to download it on demand or package it.
  3. Are we able to detect how much GPU VRAM current models have (e.g. to prevent user from having OOM errors when loading new model)? nvidia-smi does provide VRAM information. I'm not entirely sure about vulkaninfoSDK though. Will keep update this.

0xSage commented 1 week ago
  1. When in runtime do we detect OS and architecture?
  2. Do we have graceful failures when users have incompatible setup?

At the moment we fail silently. Users get a vague message and have to send us their logs, creating more work on both sides. If they have a niche architecture, and it is not supported, we just make it very clear in errors. (more likely, they'll download the wrong distro, in which case a clear error message would be nice).

  1. Do we currently have a compatibility chart anywhere on supported OS/hardware and versions?
  2. If not, lets make one? For all 3 engines.
0xSage commented 1 week ago

@dan-homebrew lets handle the common model loading graceful failures in a separate ticket. 🙏

namchuai commented 1 week ago
  1. When in runtime do we detect OS and architecture? I don't think we need this because our executable will be built for each platform, so we can using macro to detect OS and arch.

  2. Do we have graceful failures when users have incompatible setup? Currently we don't have a general message for user that have incompatible setup. I think we can run the check at main process when starting cortex and output std::err if user have incompatible setup.

  1. Do we currently have a compatibility chart anywhere on supported OS/hardware and versions?

    • We don't have any chart at the moment.
  2. If not, lets make one? For all 3 engines.

    • 👍

Please update me if I'm wrong @nguyenhoangthuan99 @vansangpfiev

0xSage commented 1 week ago
  1. See bug https://github.com/janhq/jan/issues/2734 . We also need to think through if this is an API endpoint used by Jan?

  2. I think we should have error codes like UnsupportedCPU, or InsufficentMemory, similar to OpenAI, but covering a lower level of errors that we might not want to abstract away from users at the moment. The errors get properly bubbled up to users in Jan app (cc @louis-jan ) so we can stop asking peopel for their logs. 😢

Compatibility chart DRAFT. @Van-QA I'm wondering if you have a better version?

https://docs.google.com/spreadsheets/d/1skQLXm2iVjEsG_TJsTN7jH7nfMTj7XMx6QBKG2DRlfc/edit?gid=1694305799#gid=1694305799