replicate / cog-triton

A cog implementation of Nvidia's Triton server
Apache License 2.0
11 stars 0 forks source link

Joe/lang 200 patch trt llm triton backend stop sequences #9

Closed joehoover closed 5 months ago

joehoover commented 5 months ago

This PR mostly fixes stop sequences, however this is one edge case.

If the user-specified stop sequence doesn't wind up perfectly matching the generated stop sequences IDs, the current code pops the match out, but continues generating. This isn't ideal and needs to be handled better, but the fundamental problem is that we can't stop triton from the client.

So we sort of need to say, your stop sequence almost matched, but not enough to matter, here's the entire sequence. Please try again.

I'm going to merge with this known bug so we can unblock on \n stopping.