from openai import OpenAI
client = OpenAI(
base_url="http://ip:port/v1",
api_key="test",
)
completion = client.chat.completions.create(
model="mistralai/Mistral-7B-v0.1",
messages=[
{
"role": "user",
"content": "Tell me a joke.",
},
],
max_tokens=1024,
stream=True
)
for chunk in completion:
if chunk.choices[0].delta.content is not None:
print(chunk.choices[0].delta.content, end="")
Now that token streaming support has merged (#397), we can enable streaming response in the OpenAI RESTful API endpoint.
This PR
Running the Server
Client