hi!
i just tested your way but i got worse response time
i'm leaving this issue because there would be somthing wrong in my code or logic
or i'm using this tools inappropriate
environment
Ubuntu 18.04
T4
torch == 1.11.0+cu113
optimum == 1.4.0
onnx == 1.12.0
Python 3.8.10
triton 22.01
i ported stabilityai/stable-diffusion-2-1-base with convert_stable_diffusion_checkpoint_to_onnx.py and used your model directory with fixing some pbtxt dimensions
and add noise_pred = noise_pred.to("cuda") this line at link
and triton server worked like below
then i inference with this prompts
prompts = [
"A man standing with a red umbrella",
"A child standing with a green umbrella",
"A woman standing with a yellow umbrella"
]
and i get response after 6.8 sec (avg of 3 inferences)
strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec.
of course it was done at same environment and it's also served from triton inference server
(but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)
is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline?
i expected more performance as it's hard to serve..
hi! i just tested your way but i got worse response time i'm leaving this issue because there would be somthing wrong in my code or logic or i'm using this tools inappropriate
environment
i ported
stabilityai/stable-diffusion-2-1-base
withconvert_stable_diffusion_checkpoint_to_onnx.py
and used your model directory with fixing some pbtxt dimensionsand add
noise_pred = noise_pred.to("cuda")
this line at linkand triton server worked like below
then i inference with this prompts
and i get response after 6.8 sec (avg of 3 inferences)
strange thing is that that as i put same prompt to StableDiffusionPipeline, it takes nearby 5sec. of course it was done at same environment and it's also served from triton inference server (but i maximize StableDiffusionPipeline's performance with some tips from diffuser docs link)
is serving Stable Diffusion model to onnx is better than using StableDiffusionPipeline? i expected more performance as it's hard to serve..