Closed lileilai closed 2 years ago
This latency is usually for the first run. You should warmup your run at least 5 iterations then measure the latency by averaging next 50 iterations (our lantency benchmark code).
You can also try to compare with our Stochasticx cli deployment which already supported TensorRT and AITemplate on A100 follow our instruction (here). The commands to deploy TensorRT/AITemplate on your machine are:
I have try the process described in this code repo about tensorrt, but i can not reproduce the TensorRT latency on A100, my fp16 result is about 3.2,greater than the statistic that you post on 。