replicate / cog-triton

A cog implementation of Nvidia's Triton server
Apache License 2.0
11 stars 0 forks source link

Joe/lang 193 fix mistral decoding #5

Closed joehoover closed 6 months ago

joehoover commented 6 months ago

This fixes decoding for mixtral and llama. It also adds triton template configs and the default triton configs we use to serve models. Finally, it updates the script we use for generating the default configs.

The fix for mixtral and llama decoding was to use token accumulation.

linear[bot] commented 6 months ago
LANG-193 Fix mistral decoding

Our current cog-triton implementation doesn't correctly decode mistral and llama. Current output emits tokens with no spaces.