neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
3.01k stars 176 forks source link

[Pipeline Refactor] async #1380

Closed dsikka closed 11 months ago

dsikka commented 1 year ago

Summary

Testing

The following script makes multiple calls (with different number of prompts) using the run_async function

import asyncio

from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline
from deepsparse.v2.utils import InferenceState

model_path = "hf:mgoin/TinyStories-1M-deepsparse"
pipeline = TextGenerationPipeline(model_path, prompt_sequence_length=3)

prompts = [["Hello there!", "The sun shined bright"], ["The dog barked"]]

async def func(index):
    print("Hello World", index)
    inference_state = InferenceState()
    inference_state.create_state({})
    pipeline_state = pipeline.pipeline_state

    input_value = TextGenerationInput(
        prompt=prompts[index], generation_kwargs={"max_length": 10}
    )
    return await pipeline.run_async(
        input_value,
        pipeline_state=pipeline_state,
        inference_state=inference_state
    )

async def main():
    print(await asyncio.gather(*[func(i) for i in range(len(prompts))]))

asyncio.run((main()))

Output:

Hello World 0
Hello World 1
[TextGenerationOutput(created=datetime.datetime(2023, 11, 10, 14, 27, 43, 825126), prompts=['Hello there!', 'The sun shined bright'], generations=[GeneratedText(text='”\n\nThe little girl was so excited', score=None, finished=True, finished_reason='length'), GeneratedText(text=' and the sun was shining brightly.\n\nThe', score=None, finished=True, finished_reason='length')]), TextGenerationOutput(created=datetime.datetime(2023, 11, 10, 14, 27, 43, 825809), prompts=['The dog barked'], generations=[GeneratedText(text=' and ran away. He was so happy that he', score=None, finished=True, finished_reason='length')])]
tdg5 commented 11 months ago

Re: The assertion error I hit earlier... I ran 10 prompts 700 times w/ max_workers=1 and I never hit the assertion error. So the related bug has definitely got something to do with concurrency, but I can't offer any more insight into whether it is an issue with the engine or with this code when run with concurrency.

dsikka commented 11 months ago

since the way we're scheduling operators has changed, need to reassess the async functionality

dsikka commented 11 months ago

This PR has been updated to use the new operator scheduling with the run async function.