withcatai / node-llama-cpp

Run AI models locally on your machine with node.js bindings for llama.cpp. Force a JSON schema on the model output on the generation level
https://node-llama-cpp.withcat.ai
MIT License
829 stars 80 forks source link

A way to get the answer while it's generating. #75

Closed pierrbt closed 10 months ago

pierrbt commented 10 months ago

Feature Description

I try to generate some answers from an AI model and the generating time is pretty slow, I'd like to be able to get the answer as it is generated by llama.cpp. This is to make a webchat so it would be really better like this

The Solution

It'd be handy to use on() events.

Considered Alternatives

I would then handle it using React states but this the event/emitters would do the workaround, maybe another function instead of "prompt" that would take a callback ?

Additional Context

I'd also like to know what models are lightweight because I tried LLaMa2 (7B version) and even on a i5-12600k it's pretty slow, if you know some models that are faster ?

Related Features to This Feature Request

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

pierrbt commented 10 months ago

Haven't seen onToken(). If you have an idea about a lightweight model, it'd be perfect