lobehub / lobe-chat

🤯 Lobe Chat - an open-source, modern-design LLMs/AI chat framework. Supports Multi AI Providers( OpenAI / Claude 3 / Gemini / Ollama / Bedrock / Azure / Mistral / Perplexity ), Multi-Modals (Vision/TTS) and plugin system. One-click FREE deployment of your private ChatGPT chat application.
https://chat-preview.lobehub.com
Other
35.6k stars 8.41k forks source link

[Feature Request] Display Inference Speed #2129

Open domfahey opened 3 months ago

domfahey commented 3 months ago

🥰 Feature Description

2024-04-21_10-11-01

Please consider adding the ability to display the inference speed for each interaction with the AI model.

🧐 Proposed Solution

This could be presented in a format similar to "Round trip time: 2.52s" or a more detailed breakdown like the example below:

Input Output Total
Speed (T/s) 868 723 731
Tokens 33 480 513
Inference Time (s) 0.04 0.66 0.70

Displaying the inference speed would allow users to better understand the responsiveness of the AI model and help them gauge the performance of their queries. This information could also be useful for developers and researchers to optimize their models and improve the overall efficiency of LobeChat.

📝 Additional Information

No response

lobehubbot commented 3 months ago

👀 @domfahey

Thank you for raising an issue. We will investigate into the matter and get back to you as soon as possible. Please make sure you have given us as much context as possible.\ 非常感谢您提交 issue。我们会尽快调查此事,并尽快回复您。 请确保您已经提供了尽可能多的背景信息。