neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

[Pipeline Refactor] Operator Registry #1420

Closed dsikka closed 9 months ago

dsikka commented 9 months ago

Example using Operator.create(...) for the registered text generation pipeline operator


from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.operators import Operator

model_path = "hf:mgoin/TinyStories-1M-deepsparse"
pipeline = Operator.create(
    task="text_generation",
    model_path=model_path,
    prompt_sequence_length=3,
    engine_kwargs={"engine_type": "onnxruntime"},
)

def run_requests():
    prompts = [["Hello there!", "How are you?"]]
    outputs = []
    for i in range(len(prompts)):
        input_value = TextGenerationInput(
            prompt=prompts[i],
            generation_kwargs={
                "do_sample": False,
                "max_length": 20,
            },
        )
        output = pipeline(input_value)
        yield output

output = run_requests()
for x in output:
    for g in x.generations:
        print("\n")
        print(g)