withcatai / node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Force a JSON schema on the model output on the generation level
https://withcatai.github.io/node-llama-cpp/
MIT License
760 stars 65 forks source link

feat: use the best compute layer available by default #175

Closed giladgd closed 4 months ago

giladgd commented 4 months ago

Description of change

Fixes #160 Fixes #169

How to use node-llama-cpp after this change

node-llama-cpp will now detect the available compute layers on the system and use the best one by default. If the best one fails to load, it'll try the next best option until it manages to load the bindings.

To use this logic, just use getLlama without specifying the compute layer:

import {getLlama} from "node-llama-cpp";

const llama = await getLlama();

To force it to load a specific compute layer, you can use the gpu parameter on getLlama:

import {getLlama} from "node-llama-cpp";

const llama = await getLlama({
    gpu: "vulkan" // defaults to `"auto"`. can also be `"cuda"` or `false` (to not use the GPU at all)
});

To inspect what compute layers are detected in your system, you can run this command:

npx --no node-llama-cpp inspect gpu

If this command fails to find CUDA or Vulkan although using getLlama with gpu set to one of them works, please open an issue so I can investigate it

Pull-Request Checklist

github-actions[bot] commented 4 months ago

:tada: This PR is included in version 3.0.0-beta.13 :tada:

The release is available on:

Your semantic-release bot :package::rocket: