ml-explore / mlx-examples

Examples in the MLX framework
MIT License
6.29k stars 897 forks source link

Prompt caching in `mlx_lm.server` #1026

Closed awni closed 1 month ago

awni commented 1 month ago

Added a basic prompt cache in mlx_lm.server for chat mode. But it does support chatting with cache reuse.

chimezie commented 1 month ago

This would not be back wards compatible with any later incorporation of batched input for generate (i.e., #948)