Closed awni closed 1 month ago
Added a basic prompt cache in mlx_lm.server for chat mode. But it does support chatting with cache reuse.
mlx_lm.server
This would not be back wards compatible with any later incorporation of batched input for generate (i.e., #948)
Added a basic prompt cache in
mlx_lm.server
for chat mode. But it does support chatting with cache reuse.