[Cherry-Pick][Text Generation] Terminate the inference when kv cache is full

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

https://neuralmagic.com/deepsparse/

Other

2.97k stars 171 forks source link

Closed dbogunowicz closed 9 months ago

dbogunowicz commented 9 months ago

Cherry-pick for #1446

tlrmchlsmth commented 9 months ago

Could we add a test case to catch this case?