neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

[Text Generation][V2] `LinearRouter` to accept SPLIT/JOIN #1434

Open dbogunowicz opened 9 months ago

dbogunowicz commented 9 months ago

It seems that fundamentally at the Pipeline level, there is an assumption that ops is a list, not a dictionary.

To reproduce:

from deepsparse.v2.text_generation import TextGenerationPipelineNoCache

prompt = ["Some funny prompt"]

pipeline = TextGenerationPipelineNoCache(model_path="hf:mgoin/TinyStories-1M-ds",
                                         onnx_model_name="model-orig.onnx",
                                         sequence_length=20)

out = pipeline(prompt=prompt)
Traceback (most recent call last):
  File "/home/ubuntu/.cache/JetBrains/RemoteDev/dist/67886da002816_pycharm-professional-231.9225.5/plugins/python/helpers/pydev/pydevd.py", line 1496, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/home/ubuntu/.cache/JetBrains/RemoteDev/dist/67886da002816_pycharm-professional-231.9225.5/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/ubuntu/damian/deepsparse/hehe2.py", line 10, in <module>
    out = pipeline(prompt=prompt,
  File "/home/ubuntu/damian/deepsparse/src/deepsparse/v2/pipeline.py", line 265, in __call__
    return self.run(*args, **kwargs)
  File "/home/ubuntu/damian/deepsparse/src/deepsparse/v2/text_generation/pipeline_no_kv_cache.py", line 123, in run
    return super().run(*args, **kwargs)
  File "/home/ubuntu/damian/deepsparse/src/deepsparse/v2/pipeline.py", line 217, in run
    operator=self.ops[next_step],
KeyError: 0