oobabooga / text-generation-webui

A Gradio web UI for Large Language Models.
GNU Affero General Public License v3.0
40.57k stars 5.31k forks source link

Extension long_replies slows down output and cpu #5814

Closed djbritt closed 7 months ago

djbritt commented 7 months ago

Describe the bug

I had many conversation with long replies turned on and set to 400 using these 2 models loyalmacaroni-7b and sonya-7b

As there content reached 400, the cpu went down, and the output speed went way down.

I turn off the extension, and the cpu doesn't slow, nor does the output speed.

Is there an existing issue for this?

Reproduction

Load up one of the above mentioned models, set max_new_tokens to 4096, increase min length below the main input bar to 400, and try to get a response that gets that long, and see if your output slows down, and cpu usage goes down.

Screenshot

Screenshot from 2024-04-05 20-55-24 This is a normal output without the extension enabled. When it is enabled, this is a downward slope, slowly decreasing.

Logs

na

System Info

Ubuntu 22.02
MSI - Bravo 15 15.6" 144hz Gaming Laptop FHD - Ryzen 5-7535HS with 16GB RAM - GeForce RTX 4050 with 6G GDDR6 - 1TB NVMe SSD - Black
djbritt commented 7 months ago

Ah, sorry I think this is actually a limitation of my pc. I am getting weird slowdown even with the plugin disabled.

djbritt commented 7 months ago

@oobabooga - I increased n_batch to 1024, and this issue went away. In your wiki on 4 - model tab you say increasing n_batch hasn't helped you see a speed increase, but I can tell you it helped me.