Prompt caching in `mlx_lm.server`

ml-explore / mlx-examples

Examples in the MLX framework

MIT License

6.29k stars 897 forks source link

Closed awni closed 1 month ago

awni commented 1 month ago

Added a basic prompt cache in mlx_lm.server for chat mode. But it does support chatting with cache reuse.

chimezie commented 1 month ago

This would not be back wards compatible with any later incorporation of batched input for generate (i.e., #948)