vladmandic / sd-extension-system-info

System and platform info and standardized benchmarking extension for SD.Next and WebUI
https://vladmandic.github.io/sd-extension-system-info/pages/benchmark.html
MIT License
288 stars 51 forks source link

Time it takes to generate image vs it/s #34

Closed Jordain closed 1 year ago

Jordain commented 1 year ago

Hi, I was testing the differences between using the command line arg --opt-sdp-attention and --xformers in Automatic1111. The results I received was xformers was a bit faster for the time it takes to generate an image however it scored lower on the benchmark. the results I received for --opt-sdp-attention was a bit slower for the time it takes to generate the image but scored a lot better on the benchmark.

Wondering why --opt-sdp-attention wouldn't run faster if it has a higher it/s than --xformers?

vladmandic commented 1 year ago

can you share actual results, its nearly impossible to say anything based on this.

Jordain commented 1 year ago

This is the results from the --opt-sdp-attention

2023-09-26 18:26:24.571427 | 16.46 / 18.89 / 23.12 | app:stable-diffusion-webui.git updated:2023-08-31 hash:5ef669de url:https://github.com/AUTOMATIC1111/stable-diffusion-webui.git/tree/master | arch:AMD64 cpu:Intel64 Family 6 Model 183 Stepping 1, GenuineIntel system:Windows release:Windows-10-10.0.22621-SP0 python:3.10.6 | torch:2.0.1+cu118 autocast half xformers:0.0.20 diffusers: transformers:4.30.2 | device:NVIDIA GeForce RTX 4090 (1) (compute_37) (8, 9) cuda:11.8 cudnn:8700 driver:531.79 24GB | sdp none | sd_xl_base_1.0_0.9vae.safetensors [e6bb9ea85b] | jorda |   | 47ada2 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

This is the results from the -- xformers

2023-09-26 18:56:52.545432 | 13.18 / 20.21 / 23.29 | app:stable-diffusion-webui.git updated:2023-08-31 hash:5ef669de url:https://github.com/AUTOMATIC1111/stable-diffusion-webui.git/tree/master | arch:AMD64 cpu:Intel64 Family 6 Model 183 Stepping 1, GenuineIntel system:Windows release:Windows-10-10.0.22621-SP0 python:3.10.6 | torch:2.0.1+cu118 autocast half xformers:0.0.20 diffusers: transformers:4.30.2 | device:NVIDIA GeForce RTX 4090 (1) (compute_37) (8, 9) cuda:11.8 cudnn:8700 driver:531.79 24GB | xformers none | sd_xl_base_1.0_0.9vae.safetensors [e6bb9ea85b] | jorda |   | daddca -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

Xformers time was slightly better generation time than opt sdp attention even though it's saying it has higher it/s

vladmandic commented 1 year ago

progressbar in console show real-time it/s for each step. but only for that step. benchmark resutls are measured end-to-end. so there is init/preprocess/actual run/postprocess. you need to compare all those steps. i cant dig into a1111, but it sounds like init time with xformers was substantial, so overall result is lower.

Jordain commented 1 year ago

Thanks your reply Vlad. Sorry, trying to understand your post. Does that mean that Auto1111 does not record the init time in the time displayed after the image is generated?

So does that mean the performance from --opt-sdp-attention is better than --xformers?

I am just testing image generation with a batch of 1 to record the times and looking at the difference between 16.46 and 13.18. I would assume there would be a difference in the time it takes to generate that 1 image. However the results are quite similar with --xformers of a it/s of 13.18 slightly beating the --opt-sdp-attention 16.46 it/s.

I feel like I am missing something or I am not understanding the it/s benchmark output.

vladmandic commented 1 year ago

It means thar it/s in console log progress bar is only partial information and needs to be agrrefated with full run info.

Jordain commented 1 year ago

Ok thanks, I understand that. I am looking at the WebUI for the time it takes instead of the console it/s times. This is were I see the time output. image

vladmandic commented 1 year ago

i would have to dig in how webui measures that time, but as far as i know it looks at operation begin/end which again does not include all init time (but does include pre/post processing).

Jordain commented 1 year ago

I was looking at the benchmark results again and I notice that xformers still show up as (xformers:0.0.20) on both the --opt-sdp-attention and --xformers under libraries. Do you think that might be the reason why they have similar times using both? I can try to uninstall xformers and see if I get different results.

vladmandic commented 1 year ago

a lot of libraries are going to force-load xformers even if you chose not to use them. so yes, its best bet not to have them installed if you're not using them.

Jordain commented 1 year ago

I deleted xfromers to see if it changed performance or benchmark but it still stayed around the same, but I did notice something else. When running --opt-sdp-attention it only used 61.5% of the total GPU vs --xformers which is using 70.7% of the GPU.

Is there a way to get --opt-sdp-attention to 70.7% same as --xformers. If I did that would that improve the speed?

vladmandic commented 1 year ago

the fact that sdp uses less gpu than xformers, doesn't mean its less efficient, its all about how your cpu is feeding the gpu. and sdp and xformers perform some operations on gpu vs cpu, they dont behave identically. i wrote extensively in the past about which is better for which type of hardware combinations.

if your gpu utilization is low, then your cpu is not feeding data fast enough. that is typical for high end gpus since running normal 512x512 batch-size 1 is never going to saturate the gpu, but also in many other scenarios. (it would be different if your gpu utilization was below 40%, that would nearly always indicate a problem)

Jordain commented 1 year ago

Thanks for the info. Do you have a link to your post about which is better?

vladmandic commented 1 year ago

read through those threads:

Jordain commented 1 year ago

Thanks!

Jordain commented 1 year ago

I read through your threads, and I decided to try out sdp and xformers on my older machine that has a gtx 1060. You mentioned that xformers would work better on older machines, but when I tested it with sdp the results I got were slightly faster than xformers.

You mentioned that xformers will use some CPU and GPU to process the image in one of your posts. Is sdp faster because my gtx 1060 is that much better than my i5 CPU?

Here are the specs: image image

vladmandic commented 1 year ago

Is sdp faster because my gtx 1060 is that much better than my i5 CPU?

correct. its not the fact that its i5, but its haswell, a bit older generation with lower bandwidth, so it cannot feed the gpu efficiently enough.