Closed invokerbyxv closed 2 months ago
cc @gante
Hi @invokerbyxv 👋
Following our issues guidelines, we reserve GitHub issues for bugs in the repository and/or feature requests. For any other matters, we'd like to invite you to use our forum or our discord 🤗 Since this is your first issue with us, I'm going to answer your question :)
Stopping text based on what we see in our stream is not possible. However, we can encourage our LLM to avoid the behavior we dislike! In this case, repetitions can be tamed with these two generate
flags (which you can also pass to a pipeline):
repetition_penalty=x
, which lowers the odds of the model repeating tokens if x>1.0
(the higher the value, the bigger the impact)no_repeat_ngram_size=n
, which forbids the model to repeat ngrams of size n
(docs here, click on Expand parameters
)
I'm using
TextIteratorStreamer
for streaming output.Since LLM may repeat its output indefinitely, I would like to be able to have LLM stop generating when it receives a request to cancel.
Is there any way to accomplish this?
model: glm-4-9b-chat