This fixes decoding for mixtral and llama. It also adds triton template configs and the default triton configs we use to serve models. Finally, it updates the script we use for generating the default configs.
The fix for mixtral and llama decoding was to use token accumulation.
This fixes decoding for mixtral and llama. It also adds triton template configs and the default triton configs we use to serve models. Finally, it updates the script we use for generating the default configs.
The fix for mixtral and llama decoding was to use token accumulation.