Closed bvanslyke closed 1 year ago
Simply adding can_stream = True
adds streaming support for non-chat mode for me.
llm_llama_cpp.py line 170 and a new 171:
class LlamaModel(llm.Model):
can_stream = True
Together with adding max_tokens=4000
to streaming responses, as described in #6, the plugin appears to work as advertised. Those two would be good fixes to have in a release.
Rather than streaming output I see all of the output show up at the end, all at once.
But, if I add a print statement before an output item is yielded, then I'll see text generated line-by-line from my print().
llm -m llamacode "My prompt here"
on a models added with and without the--llama2-chat
option-m 4
, streaming works.