wangzhaode / mnn-llm

llm deploy project based mnn.
Apache License 2.0
1.46k stars 159 forks source link

memory budget reduction #177

Closed huangzhengxiang closed 5 months ago

huangzhengxiang commented 7 months ago

Add stream-llm (kv_chunk) supports and resolve prefill phase memory explosion.

github-actions[bot] commented 5 months ago

Marking as stale. No activity in 30 days.