implement KV cache reuse for completion - Githubissues

ngxson / wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference

https://huggingface.co/spaces/ngxson/wllama

MIT License

444 stars 23 forks source link

implement KV cache reuse for completion #101

Closed ngxson closed 3 months ago

ngxson commented 3 months ago

Equivalent to prompt_cache option on llama.cpp server