Open rhysdg opened 1 month ago
Tracking at a comparable ~0.25s now with custom ops etc - gdino vanila in pytorch is also 0.25. Making progress
150ms is now available with the TensorRT excecution provider after warmup
Worth noting that this is with an Ampere GPU - T4's in colab have horrendous performance
FP16 takes a heavy hit in inference quality for TRT
TensorrtExecutionProvider
in comparison to straight onnx with the CUDA execution provider - headed through an opset analysis etc