Closed nathanlesage closed 2 months ago
Here is a full, working loading setup of a wrapper class (the entire file):
import { ipcMain } from 'electron'
import type { LlamaModel, LlamaContext, LlamaChatSession } from 'node-llama-cpp'
import mod from 'node-llama-cpp'
export class LlamaProvider {
private modelPath: string
private loadedModelID: string
private model: LlamaModel
private context: LlamaContext
private session: LlamaChatSession
constructor () {
this.loadedModelID = 'mistral-7b-openorca.Q4_K_M.gguf' // TODO
// DEBUG
this.modelPath = '/Users/hendrik/Documents/dev/llama.cpp/models/mistral-7b-openorca.Q4_K_M.gguf'
// Hook up event listeners
ipcMain.handle('get-model-id', (event, args) => {
return this.loadedModelID
})
}
async boot() {
// const { LlamaModel, LlamaContext, LlamaChatSession } = await import('node-llama-cpp')
console.log('Loading model ...')
const resolved = await (mod as any)
console.log(resolved.LlamaModel)
this.model = new resolved.LlamaModel({ modelPath: this.modelPath })
console.log('Model loaded. Generating context ...')
this.context = new resolved.LlamaContext({ model: this.model })
console.log('Context loaded. Starting new session ...')
this.session = new resolved.LlamaChatSession({ context: this.context })
console.log('Session started -- all set!')
// Example code copied to demonstrate that this code works
const q1 = "Hi there, how are you?";
console.log("User: " + q1);
const a1 = await this.session.prompt(q1);
console.log("AI: " + a1);
const q2 = "Summarize what you said";
console.log("User: " + q2);
const a2 = await this.session.prompt(q2);
console.log("AI: " + a2);
}
}
Loading model ...
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /Users/path/to/mistral-7b-openorca.Q4_K_M.gguf (version GGUF V2)
[... TRUNCATED: Llama.cpp boot up console logs]
...............................................................................................
Model loaded. Generating context ...
llama_new_context_with_model: n_ctx = 4096
llama_new_context_with_model: freq_base = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size = 512.00 MiB, K (f16): 256.00 MiB, V (f16): 256.00 MiB
llama_build_graph: non-view tensors processed: 676/676
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Pro
ggml_metal_init: picking default device: Apple M2 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading 'path/to/app/.webpack/main/native_modules/llamaBins/mac-arm64/ggml-metal.metal'
ggml_metal_init: GPU name: Apple M2 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: recommendedMaxWorkingSetSize = 11453.25 MB
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: compute buffer total size = 291.19 MiB
llama_new_context_with_model: max tensor size = 102.55 MiB
ggml_metal_add_buffer: allocated 'data ' buffer, size = 4166.09 MiB, ( 4167.72 / 10922.67)
ggml_metal_add_buffer: allocated 'kv ' buffer, size = 512.03 MiB, ( 4679.75 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc ' buffer, size = 288.02 MiB, ( 4967.77 / 10922.67)
Context loaded. Starting new session ...
Session started -- all set!
User: Hi there, how are you?
AI: Hi there! I'm doing well, thank you for asking. How can I help you today?
User: Summarize what you said
AI: I greeted you and asked how I could help.
ggml_metal_free: deallocating
{
"compilerOptions": {
"target": "ES2019",
"allowJs": true,
"module": "commonjs",
"skipLibCheck": true,
"esModuleInterop": true,
"strict": true,
"jsx": "preserve",
"strictPropertyInitialization": false,
"noImplicitAny": true,
"sourceMap": true,
"outDir": "dist",
"moduleResolution": "node",
"resolveJsonModule": true,
"downlevelIteration": true,
"baseUrl": "."
},
"include": [
"src/**/*",
"forge.config.js",
"webpack.*.js"
],
"ts-node": {
"require": ["tsconfig-paths/register"]
}
}
Webpack uses typescript to transpile the TS to JS, using this rule:
{
test: /(.ts|.tsx)$/,
exclude: /(node_modules|\.webpack)/,
use: {
loader: 'ts-loader',
options: {
transpileOnly: true,
appendTsSuffixTo: [/\.vue$/]
}
}
}
I have a very similar setup and running into the same problem. In my case I can't even use a wrapper because node-llama-cpp is a dependency of another package (langchain-js
)...
Another update: I may have a theory where this issue comes from: node-llama-cpp
loads all files dynamically, including the *.node
-extension. In other words, there is no require
/import
of the dylib. It could be that Webpack notices this and decides to wrap the entire module in a promise that resolves when the promise of the loading subroutine finishes.
Using webpack with a library that utilizes native node bindings is problematic, as the code must use node's require
on the .node
file on the file's original location for it to work properly, and on the other hand, webpack is meant to bundle code together and handle the imports by itself, so these conflicting approaches may not work well together.
I advise you to try to move the code that uses node-llama-cpp
outside of the frontend code that needs webpack, and put it to a part of your Electron app that can be transpiled with TypeScript's tsc
directly.
Maybe you can use Electron's ipcMain
to communicate between the main process that will use node-llama-cpp
directly without webpack and the renderer process that will use webpack.
To fix the weird types with your current setup, you could perhaps do something like that:
import nodeLlamaCpp from "node-llama-cpp";
async function doSomething() {
const {LlamaModel, LlamaContext, LlamaChatSession} =
(await nodeLlamaCpp) as any as typeof import("node-llama-cpp");
}
doSomething();
@giladgd Thanks for the response — that looks great, I'll try that.
Meanwhile, would it be possible to hardcode the various binaries? This way, one could tell webpack that something is external and it shouldn't touch it.
This works, for example, for chokidar (see here: https://github.com/paulmillr/chokidar/blob/master/lib/fsevents-handler.js)
Specifically, one can configure webpack with externals: { fsevents: "require('fsevents')"}
and that works like a charm.
But the promise awaiting is fine for now, it's nothing too big to complain about.
@nathanlesage Hardcoding the binary paths is not possible due to the nature of this library, as it's meant to support many OSs, architectures, compute layers, and dynamic arbitrary build options passed to the getLlama()
method.
For each possible configuration passed to the getLlama
method there's a folder with a binary, either an existing prebuilt one or one that will be created on demand when building from source.
I think it'd be best to tell Webpack that the entire node-llama-cpp
library is external.
hi, I'm using Electron + Webpack + Typescript + React. I follow your instruction but I get this error
Error: ENOENT: no such file or directory, open '<project_path>\undefinedbinariesGithubRelease.json'] {
errno: -4058,
code: 'ENOENT',
syscall: 'open',
path: '<project_path>\\undefinedbinariesGithubRelease.json'
}
when calling
async function llamaModule (): Promise<typeof import('node-llama-cpp')> {
return await (mod as any);
}
const module = await llamaModule(); // error here
Do you have any suggestion?
@bqhuyy Can you please share more details about the issue you're facing?
Please provide a longer paths in the error, OS type and version, nodejs version, node-llama-cpp
version, tsconfig.json
, etc.
In the latest beta of version 3, I've added support for scaffolding a new project from a template.
You can use it to generate an Electron project with everything configured already so that you can use node-llama-cpp
right away with full TypeScript support (including communication between the main process and the renderer process).
Run this command and select the Electron template to try it out:
npm create --yes node-llama-cpp@beta
Issue description
When bundling
node-llama-cpp
with webpack and Typescript, there's something weird happening: Webpack somehow appears to load the module as a promise. After that is resolved, everything works fine, but this makes the code extremely weird.Expected Behavior
Bundling code with webpack should work out of the box as indicated in the getting started guide.
NOTE: I am using webpack because I'm working on an Electron app with Electron forge. I cannot "just" use TypeScript.
Actual Behavior
Destructuring the module import resolves in
undefines
. Importing everyting at ones gives me a promise that, if I await this, then actually gives me the modules as it should. Also, they then work fine. See code:Steps to reproduce
It works when I do the following mental gymnastics:
And I receive the proper output that indicates that llama.cpp has loaded successfully. I have not yet tried to prompt the model, but I can confirm that the model has been loaded into RAM successfully.
My Environment
node-llama-cpp
versionAdditional Context
It appears that something that the
dist
files of node-llama-cpp do is something webpack doesn't like. However, I have had no success yet to find the source.All the other handling (such as bundling the node file, etc.) work flawlessly with the Electron forge setup.
Relevant Features Used
Are you willing to resolve this issue by submitting a Pull Request?
No, I don’t have the time, but I can support (using donations) development.