Extension long_replies slows down output and cpu

djbritt commented 7 months ago

Describe the bug

I had many conversation with long replies turned on and set to 400 using these 2 models loyalmacaroni-7b and sonya-7b

As there content reached 400, the cpu went down, and the output speed went way down.

I turn off the extension, and the cpu doesn't slow, nor does the output speed.

Is there an existing issue for this?

[X] I have searched the existing issues

Reproduction

Load up one of the above mentioned models, set max_new_tokens to 4096, increase min length below the main input bar to 400, and try to get a response that gets that long, and see if your output slows down, and cpu usage goes down.

Screenshot

Screenshot from 2024-04-05 20-55-24 This is a normal output without the extension enabled. When it is enabled, this is a downward slope, slowly decreasing.

Logs

na

System Info

Ubuntu 22.02
MSI - Bravo 15 15.6" 144hz Gaming Laptop FHD - Ryzen 5-7535HS with 16GB RAM - GeForce RTX 4050 with 6G GDDR6 - 1TB NVMe SSD - Black

djbritt commented 7 months ago

Ah, sorry I think this is actually a limitation of my pc. I am getting weird slowdown even with the plugin disabled.

djbritt commented 7 months ago

@oobabooga - I increased n_batch to 1024, and this issue went away. In your wiki on 4 - model tab you say increasing n_batch hasn't helped you see a speed increase, but I can tell you it helped me.

oobabooga / text-generation-webui