Evaluate lower onnx latency for Gdino

rhysdg / vision-at-a-clip

Low-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts

16 stars 1 forks source link

Open rhysdg opened 1 month ago

rhysdg commented 1 month ago

Currently we're looking at about ~3x slower
Time has been reduced almost by half with TensorrtExecutionProvider in comparison to straight onnx with the CUDA execution provider - headed through an opset analysis etc

rhysdg commented 1 month ago

Tracking at a comparable ~0.25s now with custom ops etc - gdino vanila in pytorch is also 0.25. Making progress
150ms is now available with the TensorRT excecution provider after warmup
Worth noting that this is with an Ampere GPU - T4's in colab have horrendous performance
FP16 takes a heavy hit in inference quality for TRT