kamalkraj / stable-diffusion-tritonserver

Deploy stable diffusion model with onnx/tenorrt + tritonserver
Apache License 2.0
122 stars 19 forks source link

stabilityai/stable-diffusion-2-1-base got worse response time than StableDiffusionPipeline #10

Open tofulim opened 1 year ago

tofulim commented 1 year ago

hi! i just tested your way but i got worse response time i'm leaving this issue because there would be somthing wrong in my code or logic or i'm using this tools inappropriate

environment

i ported stabilityai/stable-diffusion-2-1-base with convert_stable_diffusion_checkpoint_to_onnx.py and used your model directory with fixing some pbtxt dimensions

and add noise_pred = noise_pred.to("cuda") this line at link

and triton server worked like below

image

then i inference with this prompts

prompts = [
    "A man standing with a red umbrella",
    "A child standing with a green umbrella",
    "A woman standing with a yellow umbrella"
]

and i get response after 6.8 sec (avg of 3 inferences)

strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec. of course it was done at same environment and it's also served from triton inference server (but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)

is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline? i expected more performance as it's hard to serve..