Enable streaming option in the OpenAI API server

Now that token streaming support has merged (#397), we can enable streaming response in the OpenAI RESTful API endpoint.

This PR

adds missing package dependencies for OpenAI API server (fixes #459)
re-enables streaming responses for the OpenAPI API endpoint

Running the Server

python -m mii.entrypoints.openai_api_server \
    --model "mistralai/Mistral-7B-Instruct-v0.1" \
    --port 3000 \
    --host 0.0.0.0

Client

from openai import OpenAI

client = OpenAI(
    base_url="http://ip:port/v1",
    api_key="test",
)

completion = client.chat.completions.create(
    model="mistralai/Mistral-7B-v0.1",
    messages=[
        {
            "role": "user",
            "content": "Tell me a joke.",
        },
    ],
    max_tokens=1024,
    stream=True
)

for chunk in completion:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="")

microsoft / DeepSpeed-MII

Enable streaming option in the OpenAI API server #480

Running the Server

Client