ngxson / wllama

WebAssembly binding for llama.cpp - Enabling in-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
441 stars 21 forks source link

Feature request: get the debug output through a callback? #40

Closed flatsiedatsie closed 6 months ago

flatsiedatsie commented 6 months ago

In my project I try to 'guess' the memory that the model + context is using with some napkin calculation (see screenshot). But I was thinking: would it be possible to get more accurate information about memory use? Perhaps if all the debug info was available for parsing with a callback, or perhaps even as an object that was returned/updated after loading/inference?

This would allow me to more accurately find out if there is enough free memory to, for example, load the speech recognition and TTS processes (and other 'small stuff' like translation, OCR) without having to unload the main LLM.

Screenshot 2024-05-16 at 09 49 56
flatsiedatsie commented 6 months ago

I just realized this is related to: https://github.com/ngxson/wllama/issues/17