Open tatakof opened 5 months ago
Inner testing (DNU-RAG) results:
Model | Input price | Output price | response speed (top_k=5 & max_tokens=1000) ------------------------------------------------------------------------------------------ . gpt-4-0125-preview (128k) | $0.01 / 1K tokens | $0.03 / 1K tokens | ~27s gpt-4-1106-preview (128k) | $0.01 / 1K tokens | $0.03 / 1K tokens | ~15s gpt-4 (16k) | $0.03 / 1K tokens | $0.06 / 1K tokens | ~8s gpt-3.5-turbo-0125 (16k) | $0.0005 / 1K tokens | $0.0015 / 1K tokens | ~2s ----------------------------------------------------------------------------------------- .
using gpt4 the response speed top_k=5 is ~25s
On top of switching to gpt3.5 turbo, implementing streaming, and reducing the number of output tokens, check the following sources for more ideas: https://community.openai.com/t/how-can-i-improve-response-times-from-the-openai-api-while-generating-responses-based-on-our-knowledge-base/237169
https://www.taivo.ai/stream/__making-gpt-api-responses-faster/
https://stackoverflow.com/questions/77170803/how-to-speed-up-the-gpt4-api
https://medium.com/technology-nineleaps/accelerating-gpt-4s-response-time-with-streaming-a-simple-explanation-b75ccb055c09