Open JuliaMerz opened 10 months ago
Maybe we could change the callback to work with the actual tokens instead of the decoded string, that should make detecting the correct stop sequence simpler or is there a specific reason we need to work with strings here?
What cases of stop tokens aren't detected by the current implementation, could you provide an example? Rewinding the session should be simple, except for cases where there is some token merging happening in the tokenizer 🤔
See this segment of code
At the moment we don't catch every possible case of stop tokens, we catch some of them. Ideally we'd catch all cases, then rewind in order to bring the model into the state right after the stop token.