Open patrickjonesdotca opened 3 days ago
There's the server example (PR #367) that I made to achieve something similar. If you don't want to build it from source with the server, compiled binaries are available as releases on my fork.
But an interactive mode could be a good idea too.
A mode like the --interactive flag on llama.cpp except you only input prompts. The system loads all the other parameters in memory so they don't need reloaded.