Output does not stream - Githubissues

Rather than streaming output I see all of the output show up at the end, all at once.

But, if I add a print statement before an output item is yielded, then I'll see text generated line-by-line from my print().

for item in stream:
    # Each item looks like this:
    # {'id': 'cmpl-00...', 'object': 'text_completion', 'created': .., 'model': '/path', 'choices': [
    #   {'text': '\n', 'index': 0, 'logprobs': None, 'finish_reason': None}
    # ]}
+   print(item["choices"][0]["text"], end="")
    yield item["choices"][0]["text"]

Using llm 0.8 on macOS m1, installed via pipx
And the latest llama-cpp-python, force-reinstalled with no pip cache, rebuilt with METAL -- following this repo's README.
I see this running with default options, like llm -m llamacode "My prompt here" on a models added with and without the --llama2-chat option
When I switch to an OpenAI model like -m 4, streaming works.

simonw / llm-llama-cpp

Output does not stream #11