[Pipeline Refactor] async

dsikka commented 1 year ago

Summary

Note: this PR has been updated to use the new operator scheduling
Update pipeline to add a run_async function to be used by the deepsparse server. This introduces an asyncio loop to execute each operator and allows us to await operation completion, such that multiple requests can be accepted without blocking
Update to make sure run_async can handle multiple prompts/works with split/join

Testing

The following script makes multiple calls (with different number of prompts) using the run_async function

import asyncio

from deepsparse.transformers.pipelines.text_generation import TextGenerationInput
from deepsparse.v2.text_generation.pipeline import TextGenerationPipeline
from deepsparse.v2.utils import InferenceState

model_path = "hf:mgoin/TinyStories-1M-deepsparse"
pipeline = TextGenerationPipeline(model_path, prompt_sequence_length=3)

prompts = [["Hello there!", "The sun shined bright"], ["The dog barked"]]

async def func(index):
    print("Hello World", index)
    inference_state = InferenceState()
    inference_state.create_state({})
    pipeline_state = pipeline.pipeline_state

    input_value = TextGenerationInput(
        prompt=prompts[index], generation_kwargs={"max_length": 10}
    )
    return await pipeline.run_async(
        input_value,
        pipeline_state=pipeline_state,
        inference_state=inference_state
    )

async def main():
    print(await asyncio.gather(*[func(i) for i in range(len(prompts))]))

asyncio.run((main()))

Output:

Hello World 0
Hello World 1
[TextGenerationOutput(created=datetime.datetime(2023, 11, 10, 14, 27, 43, 825126), prompts=['Hello there!', 'The sun shined bright'], generations=[GeneratedText(text='”\n\nThe little girl was so excited', score=None, finished=True, finished_reason='length'), GeneratedText(text=' and the sun was shining brightly.\n\nThe', score=None, finished=True, finished_reason='length')]), TextGenerationOutput(created=datetime.datetime(2023, 11, 10, 14, 27, 43, 825809), prompts=['The dog barked'], generations=[GeneratedText(text=' and ran away. He was so happy that he', score=None, finished=True, finished_reason='length')])]

tdg5 commented 11 months ago

Re: The assertion error I hit earlier... I ran 10 prompts 700 times w/ max_workers=1 and I never hit the assertion error. So the related bug has definitely got something to do with concurrency, but I can't offer any more insight into whether it is an issue with the engine or with this code when run with concurrency.

dsikka commented 11 months ago

since the way we're scheduling operators has changed, need to reassess the async functionality

dsikka commented 11 months ago

This PR has been updated to use the new operator scheduling with the run async function.

neuralmagic / deepsparse

[Pipeline Refactor] async #1380

Summary

Testing