neuralmagic / deepsparse

Sparsity-aware deep learning inference runtime for CPUs
https://neuralmagic.com/deepsparse/
Other
2.99k stars 173 forks source link

[Server] Use the integration field when starting a server #1508

Closed dsikka closed 9 months ago

dsikka commented 9 months ago

Summary

Testing

The following config file will create an openai server:

num_workers: 1
num_streams: 1
integration: openai
endpoints:
  - task: text_generation
    model: "hf:mgoin/TinyStories-1M-ds"
    kwargs:
      {"continuous_batch_sizes": [4, 8, 16], "force_max_tokens": True}