neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.97k stars 171 forks source link

[Pipeline Refactor][server] Update deepsparse server to work with the new pipeline #1465

Closed dsikka closed 8 months ago

dsikka commented 8 months ago

Summary

Key Changes/Additions:

Testing

num_cores: 2
num_workers: 2
endpoints:
  - task: text_generation
    model: hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds
  - task: text_classification
    model: zoo:bert-large-sst2_wikipedia_bookcorpus-pruned90_quantized

Launch the server:

deepsparse.server --config_file sample_config.yaml

Endpoints with both models are available and can run using their respective pipelines.

With continuous batching

num_cores: 2
num_workers: 2
endpoints:
  - task: text_generation
    model: hf:neuralmagic/TinyLlama-1.1B-Chat-v0.4-pruned50-quant-ds
    kwargs:
      {"continuous_batch_sizes": [2, 4], "internal_kv_cache": False}
  - task: text_classification
    model: zoo:bert-large-sst2_wikipedia_bookcorpus-pruned90_quantized

Send a request:


import requests

url = "http://localhost:5543/v2/models/text_generation-0/infer"

obj = {"prompt": ["The sun shined", "Oh hello!"], "generation_kwargs": {"num_return_sequences": 4, "do_sample": True, "max_length": 20}}

response = requests.post(url, json=obj)
print(response.json())

Output:

{'created': '2023-12-11T23:11:31.675603', 'prompts': ['The sun shined', 'Oh hello!'], 'generations': [[{'text': 'through the tree branches, and it looked like a brilliant green star. Its light was so much br', 'score': None, 'finished': True, 'finished_reason': 'length'}, {'text': 'in all the clouds. They weren’t black, or blue, or red, but the', 'score': None, 'finished': True, 'finished_reason': 'length'}, {'text': 'on us in the hot afternoon. A few small birds were called out (the pigeons,', 'score': None, 'finished': True, 'finished_reason': 'length'}, {'text': 'on her,\nIn the warmest weltan of colors.\nThe breezes ble', 'score': None, 'finished': True, 'finished_reason': 'length'}], [{'text': 'I’m on a ship!\r\nWow, what a surprise! On an enormous', 'score': None, 'finished': True, 'finished_reason': 'length'}, {'text': "We're getting ready for our first day at U-Tam in four days time. I", 'score': None, 'finished': True, 'finished_reason': 'length'}, {'text': "Well you've probably heard of the Pagan Festivals in Europe but had never thought to", 'score': None, 'finished': True, 'finished_reason': 'length'}, {'text': "\nLet's just start from the beginning – and where were we? Now it wasn't", 'score': None, 'finished': True, 'finished_reason': 'length'}]], 'input_tokens': None}