neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

[Fix][Text Generation] Fix the. outdated non-kv cache inference pathway #1328

Closed dbogunowicz closed 1 year ago

dbogunowicz commented 1 year ago

As always, our new features slightly broke the non-kv cache inference. This PR updates the pathway. As a follow-up, my high priority is to finalize the test suite to also include basic testing of the non-kv cache pathway, working on it as we speak.