mlc-ai / web-stable-diffusion

Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
https://mlc.ai/web-stable-diffusion
Apache License 2.0
3.51k stars 220 forks source link

Huge performance gap between TVM and TRT on Stable Diffusion v1.5 #46

Open felixslu opened 1 year ago

felixslu commented 1 year ago

GPU: Nvidia RTX 3090TI.

  1. Firstly, I use the log db in the repo, it gives me 3.7s to get the result.
  2. Then, I tried to tuning myself using meta-schedule(with trial count set to 50,000), it gives me 2.5s.

But, on TensorRT v8.6, for one iteration of unet, it gives me only 25ms, rather than 96ms with TVM(USE_CUBLAS =ON ; USE_CUDNN =ON; CUDA Version 12.1)

I wonder why the latency gap of stable diffusion model is so huge between TVM and TensorRT. BTW, a few weeks ago, I got a different result between TVM and TRT, where my in-house model auto-tuned by TVM performs a wonderful infer latency (almost nearby TensorRT8.5).

Do you have any ideas about it? Thanks advance.

Civitasv commented 1 year ago

Same for me.