Support Ollama streamed responses

dferrazm commented 4 months ago

chat : Added support for streamed responses.
complete : Fixed. It was not working at all. Now it works with both streamed and non-streamed responses.

To generate non-streamed responses, call the methods without passing a block. To generate streamed responses, passes the block. Eg.

# non-streamed resps
resp = ollama.chat(messages: [{ role: "user", content: "Hi" }])
resp.chat_completion # => "Hello there!"

# streamed resps
resp = ollama.chat(messages: [{ role: "user", content: "Hi" }]) { |resp| print resp.chat_completion }
# Will print iteratively "Hello there!"
resp.chat_completion # => "Hello there!"

Note: Passing the stream paramater to method will not have any effect anymore.

Closes #550

andreibondarev commented 4 months ago

@dferrazm I just tried this but nothing was streamed to me:

irb(main):005> llm = Langchain::LLM::Ollama.new(url: ENV["OLLAMA_URL"])
=>
#<Langchain::LLM::Ollama:0x0000000128a3a3d0
...
irb(main):006* llm.chat messages: [{role:"user", content:"hey"}] do |chunk|
irb(main):007*   puts chunk
irb(main):008> end
#<Langchain::LLM::OllamaResponse:0x000000012803b988>
=>
#<Langchain::LLM::OllamaResponse:0x0000000128038170
 @model="llama3",
 @prompt_tokens=nil,
 @raw_response=
  {"model"=>"llama3",
   "created_at"=>"2024-05-29T15:56:04.473077Z",
   "message"=>{"role"=>"assistant", "content"=>"Hey! How's it going?"},
   "done_reason"=>"stop",
   "done"=>true,
   "total_duration"=>11215288666,
   "load_duration"=>10900135000,
   "prompt_eval_count"=>11,
   "prompt_eval_duration"=>152727000,
   "eval_count"=>8,
   "eval_duration"=>156123000}>

dferrazm commented 4 months ago

@andreibondarev you have to pass stream: true. Should it be set by default? Probably fallback to true if the block is given, right?

patterns-ai-core / langchainrb

Support Ollama streamed responses #644