I've noticed that if -n parameter is big, and the answer is short, it starts to 'elaborate' on the topic just to predict the requested number of tokens. Even if relevancy of generated text drops with every next word. Is it possible to stop the generation on the completed sentence if the token's relevancy is under some threshold?
And in case when it generates quite a good text, it just stops in the middle of the sentence when it reaches -n limit. Why not let it finish the sentence?
I've noticed that if -n parameter is big, and the answer is short, it starts to 'elaborate' on the topic just to predict the requested number of tokens. Even if relevancy of generated text drops with every next word. Is it possible to stop the generation on the completed sentence if the token's relevancy is under some threshold?
And in case when it generates quite a good text, it just stops in the middle of the sentence when it reaches -n limit. Why not let it finish the sentence?