tjake / Jlama

Jlama is a modern LLM inference engine for Java
Apache License 2.0
499 stars 48 forks source link

streaming server support? #20

Closed kevintanhongann closed 1 month ago

kevintanhongann commented 7 months ago

Is there a way to run and expose an API streaming server compatible with OpenAI API specifications?

tjake commented 7 months ago

Probably, here's the current API call for chat

https://github.com/tjake/Jlama/blob/main/jlama-cli/src/main/java/com/github/tjake/jlama/cli/serve/GenerateResource.java

phact commented 7 months ago

I want this feature too

geoand commented 7 months ago

I am pretty sure that this would (at least) require Generator#generate to be enhanced with a callback that is called when the generation is complete.

phact commented 6 months ago

You mean for stream=false?

geoand commented 6 months ago

For both :)

phact commented 6 months ago

working PR here https://github.com/tjake/Jlama/pull/23