So the output of my finetuned mistral model ends abruptly and I ideally want it to complete the paragraph/sentences/code which it was it between of.
Although I have provided max_new_tokens = 300 and also in prompt I give to limit by 300 words.
The response is always big and ends abruptly. Any way I can ask for a complete output within desired number of output tokens?
So the output of my finetuned mistral model ends abruptly and I ideally want it to complete the paragraph/sentences/code which it was it between of. Although I have provided max_new_tokens = 300 and also in prompt I give to limit by 300 words.
The response is always big and ends abruptly. Any way I can ask for a complete output within desired number of output tokens?
here is the given generationconfig