Closed grahama1970 closed 4 months ago
the llm can generate forever in some scenarios. here are a few techniques to terminate:
use options['stop']
. models should come with stop parameters preconfigured but this might not match your specific output.
chat(model=..., messages=..., options={'stop': ['This', 'will', 'stop', 'generation']})
use options['num_predict']
. this tells the llm to stop after a set number of tokens.
chat(model=..., messages=..., options={'num_predict': 100})
stop the python (async) generator. the llm will stop generation when the client connection exits. this means you can implement 1 or 2 from above client side, or implement your own termination criteria
[!NOTE] Using
format=json
without telling the LLM to output in JSON can create infinite loop. You'll find more success with a prompt likeWhy is the sky blue? Output in JSON format.
the below results in an infinite number of new lines after the text retrurns. How do I give the async the stop command?