withcatai / node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Force a JSON schema on the model output on the generation level
https://withcatai.github.io/node-llama-cpp/
MIT License
760 stars 65 forks source link

Issue with Webpack Compilation #135

Closed nathanlesage closed 2 months ago

nathanlesage commented 6 months ago

Issue description

When bundling node-llama-cpp with webpack and Typescript, there's something weird happening: Webpack somehow appears to load the module as a promise. After that is resolved, everything works fine, but this makes the code extremely weird.

Expected Behavior

Bundling code with webpack should work out of the box as indicated in the getting started guide.

NOTE: I am using webpack because I'm working on an Electron app with Electron forge. I cannot "just" use TypeScript.

Actual Behavior

Destructuring the module import resolves in undefines. Importing everyting at ones gives me a promise that, if I await this, then actually gives me the modules as it should. Also, they then work fine. See code:

// --> All undefined upon running the app; throws error "LlamaModel is not a constructor" due to that
import { LlamaModel, LlamaContext, LlamaConversation } from 'node-llama-cpp'

// If I import everything as a single object ...
import Llama from 'node-llama-cpp'
console.log(Llama) // -> 'Promise<pending>'

Steps to reproduce

It works when I do the following mental gymnastics:

import Llama from 'node-llama-cpp'

;(Llama as any).then(mod => {
  const model = new mod.LlamaModel({ modelPath: 'path/to/model' })
  // `model` will be a proper loaded LlamaModel instance that can be used further down the road.
})

And I receive the proper output that indicates that llama.cpp has loaded successfully. I have not yet tried to prompt the model, but I can confirm that the model has been loaded into RAM successfully.

My Environment

Dependency Version
Operating System macOS Sonoma 14.2.1
CPU M2 Pro
Node.js version 18.19.0 LTS
Typescript version 5.3.3
node-llama-cpp version 2.8.3

Additional Context

It appears that something that the dist files of node-llama-cpp do is something webpack doesn't like. However, I have had no success yet to find the source.

All the other handling (such as bundling the node file, etc.) work flawlessly with the Electron forge setup.

Relevant Features Used

Are you willing to resolve this issue by submitting a Pull Request?

No, I don’t have the time, but I can support (using donations) development.

nathanlesage commented 6 months ago

Additional Info

Here is a full, working loading setup of a wrapper class (the entire file):

import { ipcMain } from 'electron'
import type { LlamaModel, LlamaContext, LlamaChatSession } from 'node-llama-cpp'
import mod from 'node-llama-cpp'

export class LlamaProvider {
  private modelPath: string
  private loadedModelID: string
  private model: LlamaModel
  private context: LlamaContext
  private session: LlamaChatSession

  constructor () {
    this.loadedModelID = 'mistral-7b-openorca.Q4_K_M.gguf' // TODO
    // DEBUG
    this.modelPath = '/Users/hendrik/Documents/dev/llama.cpp/models/mistral-7b-openorca.Q4_K_M.gguf'

    // Hook up event listeners
    ipcMain.handle('get-model-id', (event, args) => {
      return this.loadedModelID
    })
  }

  async boot() {
    // const { LlamaModel, LlamaContext, LlamaChatSession } = await import('node-llama-cpp')
    console.log('Loading model ...')
    const resolved = await (mod as any)
    console.log(resolved.LlamaModel)
    this.model = new resolved.LlamaModel({ modelPath: this.modelPath })
    console.log('Model loaded. Generating context ...')
    this.context = new resolved.LlamaContext({ model: this.model })
    console.log('Context loaded. Starting new session ...')
    this.session = new resolved.LlamaChatSession({ context: this.context })
    console.log('Session started -- all set!')

    // Example code copied to demonstrate that this code works
    const q1 = "Hi there, how are you?";
    console.log("User: " + q1);
    const a1 = await this.session.prompt(q1);
    console.log("AI: " + a1);
    const q2 = "Summarize what you said";
    console.log("User: " + q2);
    const a2 = await this.session.prompt(q2);
    console.log("AI: " + a2);
  }
}

Output

Loading model ...
llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /Users/path/to/mistral-7b-openorca.Q4_K_M.gguf (version GGUF V2)
[... TRUNCATED: Llama.cpp boot up console logs]
...............................................................................................
Model loaded. Generating context ...
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_new_context_with_model: KV self size  =  512.00 MiB, K (f16):  256.00 MiB, V (f16):  256.00 MiB
llama_build_graph: non-view tensors processed: 676/676
ggml_metal_init: allocating
ggml_metal_init: found device: Apple M2 Pro
ggml_metal_init: picking default device: Apple M2 Pro
ggml_metal_init: default.metallib not found, loading from source
ggml_metal_init: GGML_METAL_PATH_RESOURCES = nil
ggml_metal_init: loading 'path/to/app/.webpack/main/native_modules/llamaBins/mac-arm64/ggml-metal.metal'
ggml_metal_init: GPU name:   Apple M2 Pro
ggml_metal_init: GPU family: MTLGPUFamilyApple8 (1008)
ggml_metal_init: hasUnifiedMemory              = true
ggml_metal_init: recommendedMaxWorkingSetSize  = 11453.25 MB
ggml_metal_init: maxTransferRate               = built-in GPU
llama_new_context_with_model: compute buffer total size = 291.19 MiB
llama_new_context_with_model: max tensor size =   102.55 MiB
ggml_metal_add_buffer: allocated 'data            ' buffer, size =  4166.09 MiB, ( 4167.72 / 10922.67)
ggml_metal_add_buffer: allocated 'kv              ' buffer, size =   512.03 MiB, ( 4679.75 / 10922.67)
ggml_metal_add_buffer: allocated 'alloc           ' buffer, size =   288.02 MiB, ( 4967.77 / 10922.67)
Context loaded. Starting new session ...
Session started -- all set!
User: Hi there, how are you?
AI:  Hi there! I'm doing well, thank you for asking. How can I help you today?
User: Summarize what you said
AI:  I greeted you and asked how I could help.
ggml_metal_free: deallocating

TS Config

{
  "compilerOptions": {
    "target": "ES2019",
    "allowJs": true,
    "module": "commonjs",
    "skipLibCheck": true,
    "esModuleInterop": true,
    "strict": true,
    "jsx": "preserve",
    "strictPropertyInitialization": false,
    "noImplicitAny": true,
    "sourceMap": true,
    "outDir": "dist",
    "moduleResolution": "node",
    "resolveJsonModule": true,
    "downlevelIteration": true,
    "baseUrl": "."
  },
  "include": [
    "src/**/*",
    "forge.config.js",
    "webpack.*.js"
  ],
  "ts-node": {
    "require": ["tsconfig-paths/register"]
  }
}

Webpack config

Webpack uses typescript to transpile the TS to JS, using this rule:

{
    test: /(.ts|.tsx)$/,
    exclude: /(node_modules|\.webpack)/,
    use: {
      loader: 'ts-loader',
      options: {
        transpileOnly: true,
        appendTsSuffixTo: [/\.vue$/]
      }
    }
  }
DennisKo commented 6 months ago

I have a very similar setup and running into the same problem. In my case I can't even use a wrapper because node-llama-cpp is a dependency of another package (langchain-js)...

nathanlesage commented 6 months ago

Another update: I may have a theory where this issue comes from: node-llama-cpp loads all files dynamically, including the *.node-extension. In other words, there is no require/import of the dylib. It could be that Webpack notices this and decides to wrap the entire module in a promise that resolves when the promise of the loading subroutine finishes.

giladgd commented 6 months ago

Using webpack with a library that utilizes native node bindings is problematic, as the code must use node's require on the .node file on the file's original location for it to work properly, and on the other hand, webpack is meant to bundle code together and handle the imports by itself, so these conflicting approaches may not work well together.

I advise you to try to move the code that uses node-llama-cpp outside of the frontend code that needs webpack, and put it to a part of your Electron app that can be transpiled with TypeScript's tsc directly. Maybe you can use Electron's ipcMain to communicate between the main process that will use node-llama-cpp directly without webpack and the renderer process that will use webpack.

To fix the weird types with your current setup, you could perhaps do something like that:

import nodeLlamaCpp from "node-llama-cpp";

async function doSomething() {
    const {LlamaModel, LlamaContext, LlamaChatSession} =
        (await nodeLlamaCpp) as any as typeof import("node-llama-cpp");
}

doSomething();
nathanlesage commented 6 months ago

@giladgd Thanks for the response — that looks great, I'll try that.

Meanwhile, would it be possible to hardcode the various binaries? This way, one could tell webpack that something is external and it shouldn't touch it.

This works, for example, for chokidar (see here: https://github.com/paulmillr/chokidar/blob/master/lib/fsevents-handler.js)

Specifically, one can configure webpack with externals: { fsevents: "require('fsevents')"} and that works like a charm.

But the promise awaiting is fine for now, it's nothing too big to complain about.

giladgd commented 5 months ago

@nathanlesage Hardcoding the binary paths is not possible due to the nature of this library, as it's meant to support many OSs, architectures, compute layers, and dynamic arbitrary build options passed to the getLlama() method. For each possible configuration passed to the getLlama method there's a folder with a binary, either an existing prebuilt one or one that will be created on demand when building from source.

I think it'd be best to tell Webpack that the entire node-llama-cpp library is external.

bqhuyy commented 4 months ago

hi, I'm using Electron + Webpack + Typescript + React. I follow your instruction but I get this error

Error: ENOENT: no such file or directory, open '<project_path>\undefinedbinariesGithubRelease.json'] {
  errno: -4058,
  code: 'ENOENT',
  syscall: 'open',
  path: '<project_path>\\undefinedbinariesGithubRelease.json'
}

when calling

async function llamaModule (): Promise<typeof import('node-llama-cpp')> {
  return await (mod as any);
}

const module = await llamaModule(); // error here

Do you have any suggestion?

giladgd commented 4 months ago

@bqhuyy Can you please share more details about the issue you're facing? Please provide a longer paths in the error, OS type and version, nodejs version, node-llama-cpp version, tsconfig.json, etc.

giladgd commented 2 months ago

In the latest beta of version 3, I've added support for scaffolding a new project from a template. You can use it to generate an Electron project with everything configured already so that you can use node-llama-cpp right away with full TypeScript support (including communication between the main process and the renderer process).

Run this command and select the Electron template to try it out:

npm create --yes node-llama-cpp@beta