Open SergeyTokarevHYS opened 1 year ago
LangChain LLMs must return a str
(see method signature), so it won't return a generator because other LangChain modules that expect a str
will break if they get a generator object.
But the callbacks=[StreamingStdOutCallbackHandler()]
should work and print text as it gets generated token by token.
There is a stream()
method some LLMs have (see this) which returns a generator but this is an experimental feature so I didn't add it to the CTransformers
class.
It is possible to get a generator using the core library without LangChain:
from ctransformers import AutoModelForCausalLM
llm = AutoModelForCausalLM.from_pretrained('marella/gpt-2-ggml')
for chunk in llm('AI is going to', stream=True):
print(chunk, end='', flush=True)
But if you want to use it with only LangChain, I can send a PR to add the stream()
method to the CTransformers
class in LangChain.
I try to pass the arguments that are listed in the documentation, but I get nowhere.
handler = StdOutCallbackHandler() llm = CTransformers(model='TheBloke/MPT-7B-Instruct-GGML',model_file='mpt-7b-instruct.ggmlv3.q4_0.bin' , model_type='mpt',config={"stream":True, "max_new_tokens":256, "threads":6}, callbacks=[StreamingStdOutCallbackHandler()] ) llm(PROMPT_FOR_GENERATION_FORMAT.format(context=content, query=query)) but it looks not working. It does not return a generator. instead it returns a string. The model takes an extremely long time to think before it starts to print and the response speed is about the same as without the stream.