neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

[Cherry-Pick][Text Generation] Terminate the inference when kv cache is full #1447

Closed dbogunowicz closed 9 months ago

dbogunowicz commented 9 months ago

Cherry-pick for #1446

tlrmchlsmth commented 9 months ago

Could we add a test case to catch this case?

+1