oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
38.74k stars 5.1k forks source link

Lag/delay when clicking buttons in the gradio UI #4138

Open LoganDark opened 10 months ago

LoganDark commented 10 months ago

Describe the bug

When clicking buttons in the gradio web interface, there's a noticeable delay before the button press is actually received by the python server. I am not sure whether this delay is inherent to gradio or solvable in any way, but it severely hampers one use-case: repeatedly generating with max tokens = 1 in order to closely supervise / collaborate with the model. So if there is any way to reduce or eliminate this delay that would be really nice

Is there an existing issue for this?

Reproduction

Have your terminal on-screen, and click a button in the interface that usually results in console output (i.e. generate button, saving preset, etc). Not only will the web interface take a second to show the orange outline, but it will also take a second for anything to show up in console.

Devtools shows that the server takes around 400ms to respond to even a simple "stop" request when it isn't even generating anything. Maybe a python profiler should be taken to this.

Screenshot

No response

Logs

N/A

System Info

Operating System: Windows 11 Enterprise 64-bit (10.0, Build 22621)
Language: English (Regional Setting: English)
System Manufacturer: Micro-Star International Co., Ltd.
System Model: MS-7D97
BIOS: 1.20
Processor: 12th Gen Intel(R) Core(TM) i5-12400F (12 CPUs), ~5.2GHz
Memory: 16384MB RAM
Page file: 45178MB used, 3833MB available

Name: NVIDIA GeForce RTX 3060
Manufacturer: NVIDIA
Chip Type: NVIDIA GeForce RTX 3060
DAC Type: Integrated RAMDAC
Device Type: Full Display Device
Approx. Total Memory: 20250 MB
Display Memory (VRAM): 12129 MB
Shared Memory: 8121 MB
Current Display Mode: 3840 x 2160 (32 bit) (60Hz)
Monitor: Generic PnP Monitor
HDR: Supported
LoganDark commented 10 months ago

Previously: #3621 #3202, closed by the retarded stale bot.

LoganDark commented 9 months ago

Not stale.

LoganDark commented 9 months ago

Not stale.

LoganDark commented 9 months ago

Not stale.

LoganDark commented 9 months ago

Not stale.

LoganDark commented 8 months ago

Not stale.

LoganDark commented 8 months ago

Not stale.

LoganDark commented 8 months ago

Not stale.

LoganDark commented 8 months ago

Not stale.

LoganDark commented 7 months ago

Not stale.

LoganDark commented 7 months ago

Not stale.

LoganDark commented 7 months ago

Not stale.

gonjay commented 7 months ago

Same with me

gonjay commented 7 months ago

When I use an older version of text-generation-webui, the problem doesn't seem to be so obvious, and I have a feeling that it's the logic of the gradio front-end js processing that might be a bit of a problem here

LoganDark commented 7 months ago

When I use an older version of text-generation-webui, the problem doesn't seem to be so obvious, and I have a feeling that it's the logic of the gradio front-end js processing that might be a bit of a problem here

It's the time that it takes the server to respond to requests; look in the network tab of the developer tools and you'll see. It's some hundreds of milliseconds, for a server running on localhost, which is insane. I know Python is slow, but it can't possibly be this slow. There's something wrong with gradio.

gonjay commented 7 months ago

The correct way to handle this here would be to change the state of the UI first, then send a network request and give a hint that the request is in progress, just like all other chat software.

When I use an older version of text-generation-webui, the problem doesn't seem to be so obvious, and I have a feeling that it's the logic of the gradio front-end js processing that might be a bit of a problem here

It's the time that it takes the server to respond to requests; look in the network tab of the developer tools and you'll see. It's some hundreds of milliseconds, for a server running on localhost, which is insane. I know Python is slow, but it can't possibly be this slow. There's something wrong with gradio.

LoganDark commented 7 months ago

The correct way to handle this here would be to change the state of the UI first

The issue is the delay in the server processing requests, not the lack of client-side progress bars or spinners (which are indeed present when you execute the model). Simply making the server respond faster would remove the need for any further mitigations

gonjay commented 7 months ago

Gradio is really fast for making small demos, but as it becomes more and more functional, it seems difficult to carry more heavy-duty business

LoganDark commented 7 months ago

Yeah I don't think the server has any business doing 400ms of processing before it even begins to serve a request

gonjay commented 7 months ago

I found the Generate code from module/ui_chat.py:

    shared.gradio['Generate'].click(
        ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
        lambda x: (x, ''), gradio('textbox'), gradio('Chat input', 'textbox'), show_progress=False).then(
        chat.generate_chat_reply_wrapper, gradio(inputs), gradio('display', 'history'), show_progress=False).then(
        ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
        chat.save_history, gradio('history', 'unique_id', 'character_menu', 'mode'), None).then(
        lambda: None, None, None, _js=f'() => {{{ui.audio_notification_js}}}')
LoganDark commented 7 months ago

I found the Generate code from module/ui_chat.py:

    shared.gradio['Generate'].click(
        ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
        lambda x: (x, ''), gradio('textbox'), gradio('Chat input', 'textbox'), show_progress=False).then(
        chat.generate_chat_reply_wrapper, gradio(inputs), gradio('display', 'history'), show_progress=False).then(
        ui.gather_interface_values, gradio(shared.input_elements), gradio('interface_state')).then(
        chat.save_history, gradio('history', 'unique_id', 'character_menu', 'mode'), None).then(
        lambda: None, None, None, _js=f'() => {{{ui.audio_notification_js}}}')

Generate isn't a good benchmark because it's doing other work like starting the model. In the original post I used the stop button while the model wasn't running. This should do absolutely nothing because there is no actual work to do, but it takes 400ms for the server to even respond to the request

LoganDark commented 7 months ago

Eventually I got too annoyed and moved to LM Studio instead, which is way easier to run, faster, better UI... it just doesn't do training, LoRAs, or non-GGUF models.

gonjay commented 7 months ago

I've also used LM Studio, but I'd rather have a Chat UI with user-friendly interactions that I can run in a browser, since I have various devices that need to run LLM. I've also tried various Chat UIs here: [LLM webui](https://www.reddit.com/r/LocalLLaMA/comments/1847qt6 /llm_webui_recommendations/), but none of them are good enough for my needs.

LoganDark commented 7 months ago

I've also used LM Studio, but I'd rather have a Chat UI with user-friendly interactions that I can run in a browser, since I have various devices that need to run LLM. I've also tried various Chat UIs here: [LLM webui](https://www.reddit.com/r/LocalLLaMA/comments/1847qt6 /llm_webui_recommendations/), but none of them are good enough for my needs.

I'm working on my own UI too, so maybe one day there will be that. But it is definitely some months out unfortunately.

gonjay commented 7 months ago

That's amazing to hear! Keep up the fantastic work, and I look forward to seeing the incredible results in the near future!

gonjay commented 7 months ago

You might want to try this ChatGPT-Next below, next we just need to build an LLM JSON API with a format that is consistent with openai's APIs

TheLounger commented 7 months ago

I've been looking at my devtools and server responses very rarely take more than 2ms no matter what button I click. Console output is always instantaneous after pressing a button.

It would be great to see some actual results/benchmarks and/or video of the problem being reproduced or proven in some way.

LoganDark commented 7 months ago

It was easy for me to reproduce this when I reported the issue. I'll record a video when I'm back home in a couple days. stupid christmas trip...

LoganDark commented 7 months ago

I just cloned the repo fresh and tried again and I get mixed results.

Here's deleting a chat from the UI taking about half a second:

image

However, hitting Stop seems to be fast now at least when no model is being loaded:

image

however once the model is generating, latency starts to come back:

image

(this is discounting the fact that the generation took a couple seconds to actually stop)

I don't think it's my CPU being overloaded, since I'm testing with a 2-bit Phi-2 GGUF, which is positively tiny, and llama.cpp also uses only 4 cores by default. I haven't observed any difference in latency with or without my firewall running, so it's not that.

Some of the issue seems to have been fixed, but not all.

RandomInternetPreson commented 7 months ago

I don't know if you are still experiencing issues, but I've found that different browsers behave very differently. I have found Opera to work best on mobile devices, perhaps trying a different browser will resolve things.

LoganDark commented 6 months ago

Not stale.

LoganDark commented 6 months ago

Not stale.

TheInvisibleMage commented 6 months ago

Just adding my +1 that this issue is also affecting me. Possibly related: After the UI is open for some time, eventually buttons/interactions begin to stop working one by one. The tabs seem to always be usable, but other elements like the "Save History" button, the regenerate/remove last response/etc buttons, and the generate button simply stop functioning. Which particular buttons break seems to be random each time, and only seem to recover when I F5 the UI (or open a second instance in a new tab).

LoganDark commented 6 months ago

Not stale.

LoganDark commented 5 months ago

Not stale.

LoganDark commented 5 months ago

Not stale.

LoganDark commented 5 months ago

Not stale.

LoganDark commented 5 months ago

Not stale.

LoganDark commented 4 months ago

Not stale.

LoganDark commented 4 months ago

Not stale.

LoganDark commented 4 months ago

Not stale.

LoganDark commented 4 months ago

Not stale.

LoganDark commented 3 months ago

Not stale.

LoganDark commented 3 months ago

Not stale.

LoganDark commented 3 months ago

Not stale.

LoganDark commented 3 months ago

Not stale.

LoganDark commented 3 months ago

Not stale.

LoganDark commented 2 months ago

Not stale.

LoganDark commented 2 months ago

Not stale.

LoganDark commented 2 months ago

Not stale.