Discussion: Cortex.cpp Hardware Detection, Selection, and Memory Management

dan-homebrew commented 1 week ago

Overview

Note: We will probably need to break this discussion down into smaller topics:

Detection: how do we detect user's hardware, including GPUs? (Nvidia, AMD, etc)
Selection: Can the user select what hardware they want to run on (e.g. CPU-only, GPU 1 or 2)
Prediction: Can we predict which models won't run, based on Hardware Selection?
Memory Management: Are we able to detect how much GPU VRAM current models have (e.g. to prevent user from having OOM errors when loading new model)

How do we detect user's hardware?
- For OS:
- [x] We are using macro to detect user's hardware.
- For CPU arch:
- [x] Same as above. Using macro. More information can be found here Code reference: https://github.com/janhq/cortex/blob/af43dc0cb107d9b06bbe038ff5c93e018dc5ec46/engine/utils/system_info_utils.h#L52
How do we detect GPUs? (Nvidia, AMD, etc)?
- [x] We have to dump data from nvidia-smi (an executable from nvidia. which comes along with nvidia driver, IIRC)
- [ ] For AMD GPU, we will dump data from vulkaninfoSDK. This executable is provided on the internet. We need to download it on demand or package it.
Are we able to detect how much GPU VRAM current models have (e.g. to prevent user from having OOM errors when loading new model)? nvidia-smi does provide VRAM information. I'm not entirely sure about vulkaninfoSDK though. Will keep update this.

0xSage commented 1 week ago

When in runtime do we detect OS and architecture?
Do we have graceful failures when users have incompatible setup?

What's the error message when we detect incompat OS? Recommendation? Fallback option (CPU)?
What's the error message when we detect incompatible hardware? Recommendation? Fallbck option (if any)?
Any other incompatible checks we can make and alert on?
Is it good practice to link to a support page?
Are these error messages implemented in cortexcpp API, so that Jan application can bubble it up to users?

At the moment we fail silently. Users get a vague message and have to send us their logs, creating more work on both sides. If they have a niche architecture, and it is not supported, we just make it very clear in errors. (more likely, they'll download the wrong distro, in which case a clear error message would be nice).

Do we currently have a compatibility chart anywhere on supported OS/hardware and versions?
If not, lets make one? For all 3 engines.

0xSage commented 1 week ago

@dan-homebrew lets handle the common model loading graceful failures in a separate ticket. 🙏

namchuai commented 1 week ago

When in runtime do we detect OS and architecture? I don't think we need this because our executable will be built for each platform, so we can using macro to detect OS and arch.
Do we have graceful failures when users have incompatible setup? Currently we don't have a general message for user that have incompatible setup. I think we can run the check at main process when starting cortex and output std::err if user have incompatible setup.

What's the error message when we detect incompat OS? Recommendation? Fallback option (CPU)?
- IMO: Incompatible OS! Cortex only support Windows, Linux and MacOS. Exiting..
What's the error message when we detect incompatible hardware? Recommendation? Fallbck option (if any)?
- I might need some example here. Since we only have executable for amd64 and arm64. If running on other arch, the executable won't run.
- About GPU incompatible, I don't have any idea. Please suggest!
Any other incompatible checks we can make and alert on?
- I think not. Please correct me if I'm wrong @nguyenhoangthuan99 @vansangpfiev
Is it good practice to link to a support page?
- Yes, I think so
Are these error messages implemented in cortexcpp API, so that Jan application can bubble it up to users?
- I think we should be unopinionated and provide error message along with a error code? so that Jan (and other cortex consumer apps) can choose to bubble up to user or alternate it as they want.

Do we currently have a compatibility chart anywhere on supported OS/hardware and versions?
- We don't have any chart at the moment.
If not, lets make one? For all 3 engines.
- 👍

Please update me if I'm wrong @nguyenhoangthuan99 @vansangpfiev

0xSage commented 1 week ago

See bug https://github.com/janhq/jan/issues/2734 . We also need to think through if this is an API endpoint used by Jan?
I think we should have error codes like UnsupportedCPU, or InsufficentMemory, similar to OpenAI, but covering a lower level of errors that we might not want to abstract away from users at the moment. The errors get properly bubbled up to users in Jan app (cc @louis-jan ) so we can stop asking peopel for their logs. 😢

Compatibility chart DRAFT. @Van-QA I'm wondering if you have a better version?

https://docs.google.com/spreadsheets/d/1skQLXm2iVjEsG_TJsTN7jH7nfMTj7XMx6QBKG2DRlfc/edit?gid=1694305799#gid=1694305799

janhq / cortex.cpp

Discussion: Cortex.cpp Hardware Detection, Selection, and Memory Management #1089

Overview

Related