Closed djbritt closed 7 months ago
Ah, sorry I think this is actually a limitation of my pc. I am getting weird slowdown even with the plugin disabled.
@oobabooga - I increased n_batch to 1024, and this issue went away. In your wiki on 4 - model tab you say increasing n_batch hasn't helped you see a speed increase, but I can tell you it helped me.
Describe the bug
I had many conversation with long replies turned on and set to 400 using these 2 models loyalmacaroni-7b and sonya-7b
As there content reached 400, the cpu went down, and the output speed went way down.
I turn off the extension, and the cpu doesn't slow, nor does the output speed.
Is there an existing issue for this?
Reproduction
Load up one of the above mentioned models, set max_new_tokens to 4096, increase min length below the main input bar to 400, and try to get a response that gets that long, and see if your output slows down, and cpu usage goes down.
Screenshot
This is a normal output without the extension enabled. When it is enabled, this is a downward slope, slowly decreasing.
Logs
System Info