feat: use the best compute layer available by default

Description of change

feat: detect the available compute layers on the system and use the best one by default
feat: more guardrails to not load an incompatible prebuilt binary, to prevent process crashes due to linux distro differences
feat: improve logs as to why system-related issues occur and how to fix them
feat: inspect command
feat: add GemmaChatWrapper
feat: TemplateChatWrapper - easier method to create simple chat wrappers, see the type docs for more info
fix: adapt to llama.cpp breaking change
fix: when a specific compute layer is requested, fail the build if it is not found
fix: return user-defined llama tokens
docs: update more docs to prepare for version 3.0

Fixes #160 Fixes #169

How to use `node-llama-cpp` after this change

node-llama-cpp will now detect the available compute layers on the system and use the best one by default. If the best one fails to load, it'll try the next best option until it manages to load the bindings.

To use this logic, just use getLlama without specifying the compute layer:

import {getLlama} from "node-llama-cpp";

const llama = await getLlama();

To force it to load a specific compute layer, you can use the gpu parameter on getLlama:

import {getLlama} from "node-llama-cpp";

const llama = await getLlama({
    gpu: "vulkan" // defaults to `"auto"`. can also be `"cuda"` or `false` (to not use the GPU at all)
});

To inspect what compute layers are detected in your system, you can run this command:

npx --no node-llama-cpp inspect gpu

If this command fails to find CUDA or Vulkan although using getLlama with gpu set to one of them works, please open an issue so I can investigate it

Pull-Request Checklist

[x] Code is up-to-date with the master branch
[x] npm run format to apply eslint formatting
[x] npm run test passes with this change
[x] This pull request links relevant issues as Fixes #0000
[ ] There are new or updated unit tests validating the change
[x] Documentation has been updated to reflect this change
[x] The new commits and pull request title follow conventions explained in pull request guidelines (PRs that do not follow this convention will not be merged)

withcatai / node-llama-cpp

feat: use the best compute layer available by default #175

Description of change

How to use `node-llama-cpp` after this change

Pull-Request Checklist

withcatai / node-llama-cpp

feat: use the best compute layer available by default #175

Description of change

How to use node-llama-cpp after this change

Pull-Request Checklist

How to use `node-llama-cpp` after this change