Summary

Noticed while testing the server that the async generator pathway was not using the event loop for some of the operators
Also, remove an unnecessary check for max_tokens which was added for a unit test. Updated the mock data for the test instead

Testing

All existing tests pass
Streaming works as expected locally

Deepsparse Server

num_workers: 2
num_cores: 2
endpoints:
  - task: text_generation
    model: "hf:mgoin/TinyStories-1M-ds"

Output (streaming):

b'{"created": "2024-01-19T00:09:32.176925", "prompts": "Mario jumped", "generations": [{"text": " up", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.179362", "prompts": "Mario jumped", "generations": [{"text": " and", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.181231", "prompts": "Mario jumped", "generations": [{"text": " said", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.183309", "prompts": "Mario jumped", "generations": [{"text": ",", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.185126", "prompts": "Mario jumped", "generations": [{"text": " \\"", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.186887", "prompts": "Mario jumped", "generations": [{"text": "I", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.188625", "prompts": "Mario jumped", "generations": [{"text": "\'m", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.190374", "prompts": "Mario jumped", "generations": [{"text": " so", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.192264", "prompts": "Mario jumped", "generations": [{"text": " happy", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.194329", "prompts": "Mario jumped", "generations": [{"text": "!\\"", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.196427", "prompts": "Mario jumped", "generations": [{"text": "\\n", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.198535", "prompts": "Mario jumped", "generations": [{"text": "\\n", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.200589", "prompts": "Mario jumped", "generations": [{"text": "The", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.202694", "prompts": "Mario jumped", "generations": [{"text": " little", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.204785", "prompts": "Mario jumped", "generations": [{"text": " girl", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.206852", "prompts": "Mario jumped", "generations": [{"text": " smiled", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.208906", "prompts": "Mario jumped", "generations": [{"text": " and", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.211004", "prompts": "Mario jumped", "generations": [{"text": " said", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.213107", "prompts": "Mario jumped", "generations": [{"text": ",", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.215169", "prompts": "Mario jumped", "generations": [{"text": " \\"", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.217254", "prompts": "Mario jumped", "generations": [{"text": "I", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.219341", "prompts": "Mario jumped", "generations": [{"text": "\'m", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.221500", "prompts": "Mario jumped", "generations": [{"text": " glad", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.223664", "prompts": "Mario jumped", "generations": [{"text": " you", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.225834", "prompts": "Mario jumped", "generations": [{"text": "\'re", "score": null, "finished": true, "finished_reason": "length"}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.230677", "prompts": "Mario jumped", "generations": [{"text": " up", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.232912", "prompts": "Mario jumped", "generations": [{"text": " and", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.235035", "prompts": "Mario jumped", "generations": [{"text": " said", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.237132", "prompts": "Mario jumped", "generations": [{"text": ",", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.239202", "prompts": "Mario jumped", "generations": [{"text": " \\"", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.241281", "prompts": "Mario jumped", "generations": [{"text": "I", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.243331", "prompts": "Mario jumped", "generations": [{"text": "\'m", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.245415", "prompts": "Mario jumped", "generations": [{"text": " so", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.247503", "prompts": "Mario jumped", "generations": [{"text": " happy", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.249606", "prompts": "Mario jumped", "generations": [{"text": "!\\"", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.251942", "prompts": "Mario jumped", "generations": [{"text": "\\n", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.254315", "prompts": "Mario jumped", "generations": [{"text": "\\n", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.256684", "prompts": "Mario jumped", "generations": [{"text": "The", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.259095", "prompts": "Mario jumped", "generations": [{"text": " little", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.261503", "prompts": "Mario jumped", "generations": [{"text": " girl", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.263874", "prompts": "Mario jumped", "generations": [{"text": " smiled", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.266254", "prompts": "Mario jumped", "generations": [{"text": " and", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.268623", "prompts": "Mario jumped", "generations": [{"text": " said", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.271036", "prompts": "Mario jumped", "generations": [{"text": ",", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.273434", "prompts": "Mario jumped", "generations": [{"text": " \\"", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.275815", "prompts": "Mario jumped", "generations": [{"text": "I", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.278400", "prompts": "Mario jumped", "generations": [{"text": "\'m", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.280897", "prompts": "Mario jumped", "generations": [{"text": " glad", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.283398", "prompts": "Mario jumped", "generations": [{"text": " you", "score": null, "finished": false, "finished_reason": null}], "input_tokens": null}'
b'{"created": "2024-01-19T00:09:32.285946", "prompts": "Mario jumped", "generations": [{"text": "\'re", "score": null, "finished": true, "finished_reason": "length"}], "input_tokens": null}'

Output (non-streaming):

b'{"created":"2024-01-19T00:12:28.434897","prompts":"Mario jumped","generations":[[{"text":" up and said, \\"I\'m so happy!\\"\\n\\nThe little girl smiled and said, \\"I\'m glad you\'re","score":null,"finished":true,"finished_reason":"length"},{"text":" up and said, \\"I\'m so happy!\\"\\n\\nThe little girl smiled and said, \\"I\'m glad you\'re","score":null,"finished":true,"finished_reason":"length"}]],"input_tokens":null}'

Client Code

import requests
from threading import Thread
import time, argparse

import argparse

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--num-threads', type=int, default=1)
parser.add_argument('--num-tokens', type=int, default=25)

def main(num_threads=1, num_tokens=25):
    url = "http://localhost:5543/v2/models/text_generation-0/infer"

    def run(idx, prompt="Mario jumped"):
        print(f"launching thread {idx}")

        start = time.perf_counter()
        obj = {
            "prompt": prompt,
            "generation_kwargs": {
                "max_length": num_tokens
            },
            "streaming": True,
            "num_return_sequences": 2
        }

        #response = requests.post(url, json=obj)
        response = requests.post(url, json=obj, stream=True)
        for chunk in response.iter_lines(): 
            if chunk:
                print(chunk)

        end = time.perf_counter()
        print(f"finished thread {idx} : {(end - start): 0.5f}")

    ts = [Thread(target=run, args=[idx, "Mario jumped"]) for idx in range(num_threads)]

    for t in ts:
        t.start()
    for t in ts:
        t.join()

if __name__ == "__main__":
    args = parser.parse_args()
    main(num_threads=args.num_threads, num_tokens=args.num_tokens)

Openai

num_workers: 2
num_cores: 2
integration: openai
endpoints:
  - task: text_generation
    model: "hf:mgoin/TinyStories-1M-ds"

Client Code:

# Copyright (c) 2021 - present / Neuralmagic, Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import openai
from openai import OpenAI

client = OpenAI(base_url="http://localhost:5543/v1", api_key="EMPTY")

models = client.models.list()

model = "hf:mgoin/TinyStories-1M-ds"
print(f"Accessing model API '{model}'")

# Completion API
stream = False
completion = client.chat.completions.create(
    messages="The dog",
    max_tokens=10,
    stream=stream,
    model=model,
    logprobs=True
)

print(completion)

Output:

ChatCompletion(id='cmpl-57bdfb468783482798577e66c9976f80', choices=[Choice(finish_reason='length', index=None, logprobs=None, message=ChatCompletionMessage(content='\n\n\nOnce upon a time, there was a', role='assistant', function_call=None, tool_calls=None))], created=1705622824, model='hf:mgoin/TinyStories-1M-ds', object='chat.completion', system_fingerprint=None, usage=CompletionUsage(completion_tokens=10, prompt_tokens=3, total_tokens=13))

With Streaming Enabled:

Accessing model API 'hf:mgoin/TinyStories-1M-ds'
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, role='assistant', tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=None, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object=None, system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content='\n\n', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content='\n', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content='Once', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content=' upon', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content=' a', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content=' time', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content=',', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content=' there', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content=' was', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content=' a', function_call=None, role=None, tool_calls=None), finish_reason=None, index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
ChatCompletionChunk(id='cmpl-661bc284a2924e3482f92be8d2a3577c', choices=[Choice(delta=ChoiceDelta(content='', function_call=None, role=None, tool_calls=None), finish_reason='length', index=None, logprobs=None)], created=1705622877, model='/home/dsikka/.cache/huggingface/hub/models--mgoin--TinyStories-1M-ds/snapshots/eba4f1f16d00041d78c8f8a3bba5d3fa5afe5513/model.onnx', object='chat.completion.chunk', system_fingerprint=None)
(main_env) dsikka@gpuserver6:~/debugging$

neuralmagic / deepsparse

[Pipeline] Fix async generator loop #1546

Summary

Testing

Deepsparse Server

Openai