Open hans00 opened 5 months ago
Support static shape (auto padding to max length on input) and static shape KV cache for LLM.
Static shape will be problem when enable NPU, WebNN or CoreML.
I can submit a PR. But I'm not pro of models, not sure correct implement of static KV cache.
Feature request
Support static shape (auto padding to max length on input) and static shape KV cache for LLM.
Motivation
Static shape will be problem when enable NPU, WebNN or CoreML.
Your contribution
I can submit a PR. But I'm not pro of models, not sure correct implement of static KV cache.