vladmandic / automatic

SD.Next: Advanced Implementation Generative Image Models
https://github.com/vladmandic/automatic
GNU Affero General Public License v3.0
5.75k stars 431 forks source link

[Issue]: after first generation, it takes a while for the image to finish loading on the UI #506

Closed gsgoldma closed 1 year ago

gsgoldma commented 1 year ago

Issue Description

This only occurs with the first image that the program makes. All the subsequent ones immediately load into the UI.

This is not the initialization at the beginning, but after the image shows 100% completion in the console, it may take 20-30 seconds for it to update onto the WebUI.

Version Platform Description

Windows 10

jvivian commented 1 year ago

@gsgoldma - are you by chance using WSL2? I had that problem but after changing some of the CUDA settings and doing the latest git pull I no longer have the problem. What CUDA settings are you using?

gsgoldma commented 1 year ago

11.8 cuda, no wsl2

inck86 commented 1 year ago

image shows 100% completion in the console, it may take 20-30 seconds

Have same trouble

jvivian commented 1 year ago

Do either of you have the CUDA option Enable cuDNN benchmark feature enabled? Try disabling that or the Use channels last as torch memory format option.

gsgoldma commented 1 year ago

Do either of you have the CUDA option Enable cuDNN benchmark feature enabled? Try disabling that or the Use channels last as torch memory format option.

Yes, and that worked- thank you! it reduced the time of the initialization before hand too!

inck86 commented 1 year ago

Try disabling that

Confirm, it helped! Thanks!

vladmandic commented 1 year ago

do note that feature Enable cuDNN benchmark is disabled by default for a reason what it does is tell CUDA to try all different options on how to optimize math operations before selecting best one - so its totally expected that initial execution will have a delay (while CUDA is actually internally benchmarking itself)

is there anything else to be done in this issue?

inck86 commented 1 year ago

is there anything else to be done in this issue?

Is it possible to organize a one-time benchmark programmatically with the recording of tests and their subsequent use? does it make sense to look for the best option on the same hardware every time?

vladmandic commented 1 year ago

that is a totally valid request and i have that conversation open with torch team :) its not something that can be done on app level.

gsgoldma commented 1 year ago

that is a totally valid request and i have that conversation open with torch team :) its not something that can be done on app level.

t

is there anything else to be done in this issue?

Is it possible to organize a one-time benchmark programmatically with the recording of tests and their subsequent use? does it make sense to look for the best option on the same hardware every time?

And it does work well after it's completed, since I got from 1.66 it/s to over 2 it/s after it's completed. what's weird is that it generates it first, and then freezes instead of freezing before it makes it.

vladmandic commented 1 year ago

depending on circumstances, it will run benchmark before/during/after, they all trigger different torch operations. also depends on which sampler you use - for example, unipc triggers denoiser at the end only, so if benchmark is optimizing ops inside denoiser, it will apear slow/stuck near the end only.

vladmandic commented 1 year ago

btw, i'll close the issue as root cause has been found, but feel free to post updates and i'll reopen if needed.