Firstly, I use the log db in the repo, it gives me 3.7s to get the result.
Then, I tried to tuning myself using meta-schedule(with trial count set to 50,000), it gives me 2.5s.
But, on TensorRT v8.6, for one iteration of unet, it gives me only 25ms, rather than 96ms with TVM(USE_CUBLAS =ON ; USE_CUDNN =ON; CUDA Version 12.1)
I wonder why the latency gap of stable diffusion model is so huge between TVM and TensorRT.
BTW, a few weeks ago, I got a different result between TVM and TRT, where my in-house model auto-tuned by TVM performs a wonderful infer latency (almost nearby TensorRT8.5).
GPU: Nvidia RTX 3090TI.
But, on TensorRT v8.6, for one iteration of unet, it gives me only 25ms, rather than 96ms with TVM(USE_CUBLAS =ON ; USE_CUDNN =ON; CUDA Version 12.1)
I wonder why the latency gap of stable diffusion model is so huge between TVM and TensorRT. BTW, a few weeks ago, I got a different result between TVM and TRT, where my in-house model auto-tuned by TVM performs a wonderful infer latency (almost nearby TensorRT8.5).
Do you have any ideas about it? Thanks advance.