neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

[Pipeline Refactor] Migration #1460

Closed dsikka closed 8 months ago

dsikka commented 8 months ago

Summary

Testing

  1. You can load the new pipelines using the normal Pipeline.create(...) method.
  2. If the pipeline has not been registered using the new registry/migrated to use the new framework, you can use Pipeline.create(...) as well. This will use the legacy pipeline class under the hood.
  3. To use the legacy pipeline (old text generation and old image classification) which have already been migrated, use have to use the legacy Pipeline under legacy/pipeline.py

All 3 examples are shown below.

Example:

Run the new text generation pipeline (with continuous batching, if that's what your heart desires):


from deepsparse import Pipeline
from deepsparse.transformers.schemas.text_generation_schemas import TextGenerationInput

pipeline = Pipeline.create(
    task="text_generation",
    model_path=model_path,
    engine_type="deepsparse",
    internal_kv_cache=False,
    continuous_batch_sizes=[2, 4]
)

prompts = [["Hello there!", "The sun shined bright", "The dog barked"]]
for i in range(len(prompts)):
    input_value = TextGenerationInput(
        prompt=prompts[i],
        generation_kwargs={
            "num_return_sequences": 4,
            "max_new_tokens": 20,
            "do_sample": True,
        },
    )
    output = pipeline(input_value)
    for i in output.generations:
        print(i)
        print("\n")

Run the old text_generation pipeline:


from deepsparse.legacy.pipeline import Pipeline
from deepsparse.transformers.schemas.text_generation_schemas import TextGenerationInput

model_path = "hf:neuralmagic/mpt-7b-chat-pruned50-quant"
pipeline = Pipeline.create(
    task="text_generation",
    model_path=model_path,
    engine_type="deepsparse",
    internal_kv_cache=True,
)

prompts = [["Hello there!", "The sun shined bright", "The dog barked"]]
input_value = TextGenerationInput(
    prompt=prompts[0],
    generation_kwargs={
        "num_return_sequences": 4,
        "max_new_tokens": 20,
        "do_sample": True,
    },
)

output = pipeline(input_value)
for i in output.generations:
    print(i)
    print("\n")

Run any pipeline that has not yet been migrated to use the new Pipeline class/framework

from deepsparse import Pipeline

sa_pipeline = Pipeline.create(
    task="sentiment-analysis",
    model_path="zoo:bert-large-sst2_wikipedia_bookcorpus-pruned90_quantized"
)

inference = sa_pipeline("I love it!")

Next Steps