start openai_api_server.py。The streaming interface output using OpenAI requires a long wait of 23 seconds before it can be output. But the openai interface can return results within 3 seconds. How can I solve this problem

lm-sys / FastChat

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Apache License 2.0

37k stars 4.56k forks source link

start openai_api_server.py。The streaming interface output using OpenAI requires a long wait of 23 seconds before it can be output. But the openai interface can return results within 3 seconds. How can I solve this problem #1594

Open wgq910668 opened 1 year ago

wgq910668 commented 1 year ago

The streaming interface output using OpenAI requires a long wait of 23 seconds before it can be output. But the openai interface can return results within 3 seconds. How can I solve this problem

Reasoning using V100 32G GPU。

merrymercy commented 1 year ago

cc @andy-yang-1 @jstzwj

jstzwj commented 1 year ago

I need more information about model architecture and model size. Besides, is int8 argument enable?