neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs

https://neuralmagic.com/deepsparse/

Other

2.94k stars 169 forks source link

Closed dsikka closed 6 months ago

dsikka commented 6 months ago

Summary

When updating the NLEngineOperator to process batch_size > 1 with internal_kv_cache enabled (for continuous_batching), one line was missed in the update PR. This PR fixes this issue and also cleans up some of the conditions
This will also fix this ticket: https://app.asana.com/0/1205229323407165/1206239324632274/f