[Cherry-Pick] Fix the token_generator behavior for non-kv-cache models - Githubissues

neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

https://neuralmagic.com/deepsparse/

Other

2.97k stars 171 forks source link

[Cherry-Pick] Fix the token_generator behavior for non-kv-cache models #1441

Closed dbogunowicz closed 9 months ago

dbogunowicz commented 9 months ago

(Partial) Cherry-Pick for #1324