Open LoganDark opened 10 months ago
Previously: #3621 #3202, closed by the retarded stale bot.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Same with me
When I use an older version of text-generation-webui, the problem doesn't seem to be so obvious, and I have a feeling that it's the logic of the gradio front-end js processing that might be a bit of a problem here
When I use an older version of text-generation-webui, the problem doesn't seem to be so obvious, and I have a feeling that it's the logic of the gradio front-end js processing that might be a bit of a problem here
It's the time that it takes the server to respond to requests; look in the network tab of the developer tools and you'll see. It's some hundreds of milliseconds, for a server running on localhost, which is insane. I know Python is slow, but it can't possibly be this slow. There's something wrong with gradio.
The correct way to handle this here would be to change the state of the UI first, then send a network request and give a hint that the request is in progress, just like all other chat software.
When I use an older version of text-generation-webui, the problem doesn't seem to be so obvious, and I have a feeling that it's the logic of the gradio front-end js processing that might be a bit of a problem here
It's the time that it takes the server to respond to requests; look in the network tab of the developer tools and you'll see. It's some hundreds of milliseconds, for a server running on localhost, which is insane. I know Python is slow, but it can't possibly be this slow. There's something wrong with gradio.
The correct way to handle this here would be to change the state of the UI first
The issue is the delay in the server processing requests, not the lack of client-side progress bars or spinners (which are indeed present when you execute the model). Simply making the server respond faster would remove the need for any further mitigations
Gradio is really fast for making small demos, but as it becomes more and more functional, it seems difficult to carry more heavy-duty business
Yeah I don't think the server has any business doing 400ms of processing before it even begins to serve a request
I found the Generate
code from module/ui_chat.py:
shared.gradio['Generate'].click(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
lambda x: (x, ''), gradio('textbox'), gradio('Chat input', 'textbox'), show_progress=False).then(
chat.generate_chat_reply_wrapper, gradio(inputs), gradio('display', 'history'), show_progress=False).then(
ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
chat.save_history, gradio('history', 'unique_id', 'character_menu', 'mode'), None).then(
lambda: None, None, None, _js=f'() => {{{ui.audio_notification_js}}}')
I found the
Generate
code from module/ui_chat.py:shared.gradio['Generate'].click( ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then( lambda x: (x, ''), gradio('textbox'), gradio('Chat input', 'textbox'), show_progress=False).then( chat.generate_chat_reply_wrapper, gradio(inputs), gradio('display', 'history'), show_progress=False).then( ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then( chat.save_history, gradio('history', 'unique_id', 'character_menu', 'mode'), None).then( lambda: None, None, None, _js=f'() => {{{ui.audio_notification_js}}}')
Generate isn't a good benchmark because it's doing other work like starting the model. In the original post I used the stop button while the model wasn't running. This should do absolutely nothing because there is no actual work to do, but it takes 400ms for the server to even respond to the request
Eventually I got too annoyed and moved to LM Studio instead, which is way easier to run, faster, better UI... it just doesn't do training, LoRAs, or non-GGUF models.
I've also used LM Studio, but I'd rather have a Chat UI with user-friendly interactions that I can run in a browser, since I have various devices that need to run LLM. I've also tried various Chat UIs here: [LLM webui](https://www.reddit.com/r/LocalLLaMA/comments/1847qt6 /llm_webui_recommendations/), but none of them are good enough for my needs.
I've also used LM Studio, but I'd rather have a Chat UI with user-friendly interactions that I can run in a browser, since I have various devices that need to run LLM. I've also tried various Chat UIs here: [LLM webui](https://www.reddit.com/r/LocalLLaMA/comments/1847qt6 /llm_webui_recommendations/), but none of them are good enough for my needs.
I'm working on my own UI too, so maybe one day there will be that. But it is definitely some months out unfortunately.
That's amazing to hear! Keep up the fantastic work, and I look forward to seeing the incredible results in the near future!
You might want to try this ChatGPT-Next below, next we just need to build an LLM JSON API with a format that is consistent with openai's APIs
I've been looking at my devtools and server responses very rarely take more than 2ms no matter what button I click. Console output is always instantaneous after pressing a button.
It would be great to see some actual results/benchmarks and/or video of the problem being reproduced or proven in some way.
It was easy for me to reproduce this when I reported the issue. I'll record a video when I'm back home in a couple days. stupid christmas trip...
I just cloned the repo fresh and tried again and I get mixed results.
Here's deleting a chat from the UI taking about half a second:
However, hitting Stop seems to be fast now at least when no model is being loaded:
however once the model is generating, latency starts to come back:
(this is discounting the fact that the generation took a couple seconds to actually stop)
I don't think it's my CPU being overloaded, since I'm testing with a 2-bit Phi-2 GGUF, which is positively tiny, and llama.cpp also uses only 4 cores by default. I haven't observed any difference in latency with or without my firewall running, so it's not that.
Some of the issue seems to have been fixed, but not all.
I don't know if you are still experiencing issues, but I've found that different browsers behave very differently. I have found Opera to work best on mobile devices, perhaps trying a different browser will resolve things.
Not stale.
Not stale.
Just adding my +1 that this issue is also affecting me. Possibly related: After the UI is open for some time, eventually buttons/interactions begin to stop working one by one. The tabs seem to always be usable, but other elements like the "Save History" button, the regenerate/remove last response/etc buttons, and the generate button simply stop functioning. Which particular buttons break seems to be random each time, and only seem to recover when I F5 the UI (or open a second instance in a new tab).
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Not stale.
Describe the bug
When clicking buttons in the gradio web interface, there's a noticeable delay before the button press is actually received by the python server. I am not sure whether this delay is inherent to gradio or solvable in any way, but it severely hampers one use-case: repeatedly generating with max tokens = 1 in order to closely supervise / collaborate with the model. So if there is any way to reduce or eliminate this delay that would be really nice
Is there an existing issue for this?
Reproduction
Have your terminal on-screen, and click a button in the interface that usually results in console output (i.e. generate button, saving preset, etc). Not only will the web interface take a second to show the orange outline, but it will also take a second for anything to show up in console.
Devtools shows that the server takes around 400ms to respond to even a simple "stop" request when it isn't even generating anything. Maybe a python profiler should be taken to this.
Screenshot
No response
Logs
N/A
System Info