This PR mostly fixes stop sequences, however this is one edge case.
If the user-specified stop sequence doesn't wind up perfectly matching the generated stop sequences IDs, the current code pops the match out, but continues generating. This isn't ideal and needs to be handled better, but the fundamental problem is that we can't stop triton from the client.
So we sort of need to say, your stop sequence almost matched, but not enough to matter, here's the entire sequence. Please try again.
I'm going to merge with this known bug so we can unblock on \n stopping.
This PR mostly fixes stop sequences, however this is one edge case.
If the user-specified stop sequence doesn't wind up perfectly matching the generated stop sequences IDs, the current code pops the match out, but continues generating. This isn't ideal and needs to be handled better, but the fundamental problem is that we can't stop triton from the client.
So we sort of need to say, your stop sequence almost matched, but not enough to matter, here's the entire sequence. Please try again.
I'm going to merge with this known bug so we can unblock on \n stopping.