how to apply stream into MPT-7B-Instruct-GGML model

marella / ctransformers

Python bindings for the Transformer models implemented in C/C++ using GGML library.

MIT License

1.81k stars 138 forks source link

LangChain LLMs must return a str (see method signature), so it won't return a generator because other LangChain modules that expect a str will break if they get a generator object. But the callbacks=[StreamingStdOutCallbackHandler()] should work and print text as it gets generated token by token.

There is a stream() method some LLMs have (see this) which returns a generator but this is an experimental feature so I didn't add it to the CTransformers class.

It is possible to get a generator using the core library without LangChain:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')

for chunk in llm('AI is going to', stream=True):
    print(chunk, end='', flush=True)

But if you want to use it with only LangChain, I can send a PR to add the stream() method to the CTransformers class in LangChain.

marella / ctransformers

how to apply stream into MPT-7B-Instruct-GGML model #17