neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.98k stars 172 forks source link

[Pipeline Refactor][Text-Generation][No KV Cache Pipeline] Prepare scaffolding for no-kv cache pipeline #1365

Closed dbogunowicz closed 10 months ago

dbogunowicz commented 10 months ago

The goal of this PR is to serve as a cornerstone for the next PR that fully implements the TextGenerationPipelineNoCache. The main goal of this diff is to establish elegant helper function shared between both TextGenerationPipeline(s)