A way to get the answer while it's generating.

Feature Description

I try to generate some answers from an AI model and the generating time is pretty slow, I'd like to be able to get the answer as it is generated by llama.cpp. This is to make a webchat so it would be really better like this

The Solution

It'd be handy to use on() events.

Considered Alternatives

I would then handle it using React states but this the event/emitters would do the workaround, maybe another function instead of "prompt" that would take a callback ?

Additional Context

I'd also like to know what models are lightweight because I tried LLaMa2 (7B version) and even on a i5-12600k it's pretty slow, if you know some models that are faster ?

Related Features to This Feature Request

[ ] Metal support
[ ] CUDA support
[ ] Grammar

Are you willing to resolve this issue by submitting a Pull Request?

Yes, I have the time, but I don't know how to start. I would need guidance.

withcatai / node-llama-cpp