Closed kevintanhongann closed 1 month ago
Probably, here's the current API call for chat
I want this feature too
I am pretty sure that this would (at least) require Generator#generate
to be enhanced with a callback that is called when the generation is complete.
You mean for stream=false?
For both :)
working PR here https://github.com/tjake/Jlama/pull/23
Is there a way to run and expose an API streaming server compatible with OpenAI API specifications?