stabilityai/stable-diffusion-2-1-base got worse response time than StableDiffusionPipeline

hi! i just tested your way but i got worse response time i'm leaving this issue because there would be somthing wrong in my code or logic or i'm using this tools inappropriate

environment

Ubuntu 18.04
T4
torch == 1.11.0+cu113
optimum == 1.4.0
onnx == 1.12.0
Python 3.8.10
triton 22.01

i ported stabilityai/stable-diffusion-2-1-base with convert_stable_diffusion_checkpoint_to_onnx.py and used your model directory with fixing some pbtxt dimensions

and add noise_pred = noise_pred.to("cuda") this line at link

and triton server worked like below

then i inference with this prompts

prompts = [
    "A man standing with a red umbrella",
    "A child standing with a green umbrella",
    "A woman standing with a yellow umbrella"
]

and i get response after 6.8 sec (avg of 3 inferences)

strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec. of course it was done at same environment and it's also served from triton inference server (but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)

is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline? i expected more performance as it's hard to serve..

kamalkraj / stable-diffusion-tritonserver

stabilityai/stable-diffusion-2-1-base got worse response time than StableDiffusionPipeline #10

environment