sandbox-ai / ali

el Asistente Legal Inteligente
3 stars 1 forks source link

openai API response speed #1

Open tatakof opened 5 months ago

tatakof commented 5 months ago

using gpt4 the response speed top_k=5 is ~25s

On top of switching to gpt3.5 turbo, implementing streaming, and reducing the number of output tokens, check the following sources for more ideas:   https://community.openai.com/t/how-can-i-improve-response-times-from-the-openai-api-while-generating-responses-based-on-our-knowledge-base/237169

https://www.taivo.ai/stream/__making-gpt-api-responses-faster/

https://stackoverflow.com/questions/77170803/how-to-speed-up-the-gpt4-api

https://medium.com/technology-nineleaps/accelerating-gpt-4s-response-time-with-streaming-a-simple-explanation-b75ccb055c09

tatakof commented 5 months ago

Inner testing (DNU-RAG) results:

Model | Input price | Output price | response speed (top_k=5 & max_tokens=1000) ------------------------------------------------------------------------------------------ . gpt-4-0125-preview (128k) | $0.01 / 1K tokens | $0.03 / 1K tokens | ~27s gpt-4-1106-preview (128k) | $0.01 / 1K tokens | $0.03 / 1K tokens | ~15s gpt-4 (16k) | $0.03 / 1K tokens | $0.06 / 1K tokens | ~8s gpt-3.5-turbo-0125 (16k) | $0.0005 / 1K tokens | $0.0015 / 1K tokens | ~2s ----------------------------------------------------------------------------------------- .