ngxson / wllama

WebAssembly binding for llama.cpp - Enabling on-browser LLM inference
https://huggingface.co/spaces/ngxson/wllama
MIT License
444 stars 23 forks source link

implement KV cache reuse for completion #101

Closed ngxson closed 3 months ago

ngxson commented 3 months ago

Equivalent to prompt_cache option on llama.cpp server