feat: detect the available compute layers on the system and use the best one by default
feat: more guardrails to not load an incompatible prebuilt binary, to prevent process crashes due to linux distro differences
feat: improve logs as to why system-related issues occur and how to fix them
feat: inspect command
feat: add GemmaChatWrapper
feat: TemplateChatWrapper - easier method to create simple chat wrappers, see the type docs for more info
fix: adapt to llama.cpp breaking change
fix: when a specific compute layer is requested, fail the build if it is not found
fix: return user-defined llama tokens
docs: update more docs to prepare for version 3.0
Fixes #160
Fixes #169
How to use node-llama-cpp after this change
node-llama-cpp will now detect the available compute layers on the system and use the best one by default.
If the best one fails to load, it'll try the next best option until it manages to load the bindings.
To use this logic, just use getLlama without specifying the compute layer:
import {getLlama} from "node-llama-cpp";
const llama = await getLlama();
To force it to load a specific compute layer, you can use the gpu parameter on getLlama:
import {getLlama} from "node-llama-cpp";
const llama = await getLlama({
gpu: "vulkan" // defaults to `"auto"`. can also be `"cuda"` or `false` (to not use the GPU at all)
});
To inspect what compute layers are detected in your system, you can run this command:
npx --no node-llama-cpp inspect gpu
If this command fails to find CUDA or Vulkan although using getLlama with gpu set to one of them works, please open an issue so I can investigate it
Pull-Request Checklist
[x] Code is up-to-date with the master branch
[x] npm run format to apply eslint formatting
[x] npm run test passes with this change
[x] This pull request links relevant issues as Fixes #0000
[ ] There are new or updated unit tests validating the change
[x] Documentation has been updated to reflect this change
[x] The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)
Description of change
inspect
commandGemmaChatWrapper
TemplateChatWrapper
- easier method to create simple chat wrappers, see the type docs for more infollama.cpp
breaking changeFixes #160 Fixes #169
How to use
node-llama-cpp
after this changenode-llama-cpp
will now detect the available compute layers on the system and use the best one by default. If the best one fails to load, it'll try the next best option until it manages to load the bindings.To use this logic, just use
getLlama
without specifying the compute layer:To force it to load a specific compute layer, you can use the
gpu
parameter ongetLlama
:To inspect what compute layers are detected in your system, you can run this command:
Pull-Request Checklist
master
branchnpm run format
to apply eslint formattingnpm run test
passes with this changeFixes #0000