Closed Masrur02 closed 1 week ago
Hey there @Masrur02! Ah interesting - I just ran on my setup with CUDA 12.4 and TensorRT 10.0.1.6, onnxruntime-gpu 1.18.1 wihtout any issues.
Can you tell a bit more about your setup? Hardware accelerator, TensorRT, onnxruntime version, operating system etc
I'm working on a version that leverages torch.compile's tensorrt backend now too so fingers that combined with the tensorrt model optimizer library will get us some extended ops support!
Hi, I am using CUDA 12.1 on a Ubuntu 22.04, my GPU is NVIDIA RTX 4090 and this is my setup
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
bzip2 1.0.8 h5eee18b_6
ca-certificates 2024.7.2 h06a4308_0
certifi 2024.8.30 pypi_0 pypi
gast 0.6.0 pypi_0 pypi
gdown 5.2.0 pypi_0 pypi
gradio 3.26.0 pypi_0 pypi
gradio-client 0.1.2 pypi_0 pypi
huggingface-hub 0.24.6 pypi_0 pypi
idna 3.8 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1
libffi 3.3 he6710b0_2
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libstdcxx-ng 11.2.0 h1234567_1
libuuid 1.41.5 h5eee18b_0
matplotlib 3.9.2 pypi_0 pypi
ncurses 6.4 h6a678d5_0
numpy 1.26.0 pypi_0 pypi
nvidia-cudnn-cu12 8.9.2.26 pypi_0 pypi
onnxruntime-extensions 0.12.0 pypi_0 pypi
onnxruntime-gpu 1.18.0 pypi_0 pypi
opencv-python-headless 4.10.0.84 pypi_0 pypi
openssl 1.1.1w h7f8727e_0
pillow 10.3.0 pypi_0 pypi
pip 24.2 py310h06a4308_0
protobuf 3.20.3 pypi_0 pypi
pydantic 2.8.2 pypi_0 pypi
pydantic-core 2.20.1 pypi_0 pypi
python 3.10.0 h12debd9_5
pytz 2024.1 pypi_0 pypi
readline 8.2 h5eee18b_0
regex 2024.5.15 pypi_0 pypi
scipy 1.13.1 pypi_0 pypi
setuptools 72.1.0 py310h06a4308_0
sqlite 3.45.3 h5eee18b_0
tensorrt 10.3.0 pypi_0 pypi
tensorrt-cu12 10.3.0 pypi_0 pypi
tensorrt-cu12-bindings 10.3.0 pypi_0 pypi
tensorrt-cu12-libs 10.3.0 pypi_0 pypi
tk 8.6.14 h39e8969_0
torch 2.3.1 pypi_0 pypi
torchvision 0.18.1 pypi_0 pypi
triton 2.3.1 pypi_0 pypi
tzdata 2024a h04d1e81_0
wheel 0.43.0 py310h06a4308_0
xz 5.4.6 h5eee18b_1
zlib 1.2.13 h5eee18b_1
The interesting issue is if I use trt=False to avoid the tensorrt execution provider then the code works. However, if I use trt=True, then there is this error.
Did you specify any shape while converting the onnx model? Or do you use any specific shape while running the inference using tensorrt execution provider?
Hi, It seems the onnx model you converted is dynamic. There is no specified input shape. But the TensorRT execution provider experts specified shape. However, according to this, , we need to run the inference shape code. I have run it and can use the trt=True for the inferred onnx model. But the FPS has not been boosted so much. With your onnx model and trt=False the FPS is 4.64 in my PC. And with the inferred_onnx model and trt=True, the FPS is 5.60.
However, the visualization result is weird
Do you have any comments on it? Can you please suggest me something? And can you please provide your opinion that the boosted FPS is fine or it should be faster?
Hey there @Masrur02! That is an odd result indeed - I wonder the BERT separte ([SEP]) token is ending up at the predixtion visualisation too.
I'm seeing if I can reproduce now but to respond to your first couple of questions - forsure, as I was discussing over at the original repo, performance takes a bit of a hit with the TensorRT execution provider backend for onnxruntime due to some unsupported ops. I'm working on two things right now - pulling out BERT as it has full support for conversion, evaluating whether or not torch_tensorrt with the new Tensorrt model optimizer will make a difference.
My questions for you are - Are you running this code here ? Have you made any modifications? Are those latency benchmarks in seconds?
It's worth noting that I ran into problems with 'trt_fp16_enable': True
also that looked similar to the bizarre inference above minus [sep]
Hey so here's a shared colab notebook so we're meeting in a middle - inference quality looks good but the speed with a T4 over my RTX 3080 leaves much to be desired
https://colab.research.google.com/drive/1Km1FzY1aeezu1G8GKNU1PZDeWmqvyUAO?usp=sharing
I'll debug the trt setup in colab shortly to check if we can reproduce that crazy inference
Hey there @Masrur02! That is an odd result indeed - I wonder the BERT separte ([SEP]) token is ending up at the predixtion visualisation too.
I'm seeing if I can reproduce now but to respond to your first couple of questions - forsure, as I was discussing over at the original repo, performance takes a bit of a hit with the TensorRT execution provider backend for onnxruntime due to some unsupported ops. I'm working on two things right now - pulling out BERT as it has full support for conversion, evaluating whether or not torch_tensorrt with the new Tensorrt model optimizer will make a difference.
My questions for you are - Are you running this code here ? Have you made any modifications? Are those latency benchmarks in seconds?
Yes, I am running it here. However, I have modified the gdino.model.py to read the model weight from a local directory. As I had to convert your onnx model to an inferred_onnx model by using this https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/symbolic_shape_infer.py
Aha! I've managed to reproduce the shape inference issue you where originally receiving - My last update I accidentally sent up the raw conversion before running it through onnx simplifier - ran a before and after, and that resolves it, you shouldn't need to run it through the symbolic shape inference tool, it'll just be plug and play.
I'm sending up the working version now and a quick pr with the DL link - once that's done I'll give you a shout and let's see if that works out for you!
Thank you for the catch!
The results I'm getting are the following too:
Aha! I've managed to reproduce the shape inference issue you where originally receiving - My last update I accidentally sent up the raw conversion before running it through onnx simplifier - ran a before and after, and that resolves it, you shouldn't need to run it through the symbolic shape inference tool, it'll just be plug and play.
I'm sending up the working version now and a quick pr with the DL link - once that's done I'll give you a shout and let's see if that works out for you!
Thank you for the catch!
Excellent. Please send me once you are done. it will be really helpful for me.
Aha! I've managed to reproduce the shape inference issue you where originally receiving - My last update I accidentally sent up the raw conversion before running it through onnx simplifier - ran a before and after, and that resolves it, you shouldn't need to run it through the symbolic shape inference tool, it'll just be plug and play. I'm sending up the working version now and a quick pr with the DL link - once that's done I'll give you a shout and let's see if that works out for you! Thank you for the catch!
Excellent. Please send me once you are done. it will be really helpful for me.
You've got it
Where? Should I download it from your repo?
Where? Should I download it from your repo?
A bunch of pytests are just running on merge right now, once it's merged successfully just head to gdino/data
delete you're current tensorrt cache, and any .onnx files you have - then just run the example I provided you. The correct version will download and begin converting to a tensorrt engine
Just noting once again that conversion isn't perfect with unsupported ops right now - I have a bunch of projects to complete right now but I'm eager to get back to it. I'll have a mixed precision fallback built inn with the torch model soon, and I'm hoping we can get the whole way with all the latest updates available to us
Alright merged to main and all yours @Masrur02 - checking for reproducability from scratch now
Cool, successful from scratch with correct inference - python packages are:
tensorrt==10.1.0 onnxruntime=1.18.1
In a Python 3.10 virtual env
Hi, With the current setup, I get this error
INFO:root:Available providers for ONNXRuntime:
EP Error
EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:754 std::unique_ptr
However, if I comment out trt_engine_hw_compatible, then the code is working well. Now I do not need to infer the shape. Thank you so much. Let me check the FPS now.
Ah good to know! Looks like deprecation between versions huh
Yes probably. Anyways you have really done a great job.
Yes probably. Anyways you have really done a great job.
Ah I appreciate the kind word and wish luck for now! I'm closing this issue for now but feel free of course to open more and I'll be there as soon as I can. Have a good weekend!
Hi, I was trying to use the Grounded SAM2 repo (https://github.com/IDEA-Research/Grounded-SAM-2). Here, the FPS for grounding DINO block was around 10 FPS and for the SAM2 was around 200 in my setup. When I am using your onnx for the DINO, the FPS for it is improving to 30 FPS, but now the FPS for SAM2 is limited to around 18, eventually the overall FPS is not improving. I have tried the torch.cuda.synchronize(). After using this int individual inference of SAM2 has increased. But the overall FPS is still lower.
import cv2
import torch
import numpy as np
import supervision as sv
from torchvision.ops import box_convert
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from grounding_dino.groundingdino.util.inference import load_model, load_image, predict
import time
import os
from on.gdino.model import OnnxGDINO
#from on.utils.gdino_utils import load_image2, viz
# Environment settings
# Use bfloat16 only where supported
# Build SAM2 image predictor
sam2_checkpoint = "./checkpoints/sam2_hiera_large.pt"
model_cfg = "sam2_hiera_l.yaml"
sam2_model = build_sam2(model_cfg, sam2_checkpoint, device="cuda")
sam2_predictor = SAM2ImagePredictor(sam2_model)
# Build Grounding DINO model
#model_id = "IDEA-Research/grounding-dino-tiny"
device = "cuda" if torch.cuda.is_available() else "cpu"
'''grounding_model = load_model(
model_config_path="grounding_dino/groundingdino/config/GroundingDINO_SwinT_OGC.py",
model_checkpoint_path="gdino_checkpoints/groundingdino_swint_ogc.pth",
device=device
)'''
ogd = OnnxGDINO(type='gdino_fp32', trt=True)
payload = ogd.preprocess_query("road. car")
# Setup the input text prompt for Grounding DINO
text = "road. car."
output_dir = "test"
os.makedirs(output_dir, exist_ok=True)
# Capture video
video_path = 'notebooks/videos/a.webm'
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_FPS, 5)
frame_num = 0
while cap.isOpened():
ot = time.time()
ret, frame = cap.read()
if not ret:
break
#frame = cv2.resize(frame, (640, 480))
#time.sleep(0.1)sam2_local_demo.py
# Convert the frame to the required format for processing
image_source, image, image_transformed = load_image(frame)
#img,image_transformed=load_image2(frame)
end_time0=time.time()
fps0 = 1 / (end_time0 - ot)
print(f"FPS Capturing for frame {frame_num}: {fps0:.2f}")
start_time=time.time()
boxes, predicted_phrases = ogd.inference(
image_transformed.astype(np.float32),
payload,
text_threshold=0.25,
box_threshold=0.35,
)
# torch.cuda.synchronize(device="cuda")
end_time = time.time()
fps = 1 / (end_time - start_time)
print(f"FPS for DINO frame {frame_num}: {fps:.2f}")
sam2_predictor.set_image(image_source)
labels = [phrase.split('(')[0] for phrase in predicted_phrases]
confidences = [float(phrase.split('(')[1][:-1]) for phrase in predicted_phrases]
confidences= torch.tensor(confidences)
# Process the box prompt for SAM2
h, w, _ = frame.shape
boxes = boxes * torch.Tensor([w, h, w, h])
input_boxes = box_convert(boxes=boxes, in_fmt="cxcywh", out_fmt="xyxy").numpy()
start_time2=time.time()
# Enable mixed precision only for the specific block
with torch.autocast(device_type="cuda", dtype=torch.bfloat16):
if torch.cuda.get_device_properties(0).major >= 8:
# Enable tfloat32 for Ampere GPUs
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True
# Perform SAM2 prediction within the mixed precision context
masks, scores, logits = sam2_predictor.predict(
point_coords=None,
point_labels=None,
box=input_boxes,
multimask_output=False,
)
# torch.cuda.synchronize()
end_time2 = time.time()
fps2 = 1 / (end_time2 - start_time2)
print(f"FPS2 for SAM frame {frame_num}: {fps2:.2f}")
#torch.cuda.synchronize()
start_time3=time.time()
# Post-process the output of the model to get the masks, scores, and logits for visualization
if masks.ndim == 4:
masks = masks.squeeze(1)
confidences = confidences.numpy().tolist()
class_names = labels
class_ids = np.array(list(range(len(class_names))))
labels = [
f"{class_name} {confidence:.2f}"
for class_name, confidence
in zip(class_names, confidences)
]
# Calculate FPS
# Visualize image with supervision API
detections = sv.Detections(
xyxy=input_boxes, # (n, 4)
mask=masks.astype(bool), # (n, h, w)
class_id=class_ids
)
box_annotator = sv.BoxAnnotator()
annotated_frame = box_annotator.annotate(scene=frame.copy(), detections=detections)
label_annotator = sv.LabelAnnotator()
annotated_frame = label_annotator.annotate(scene=annotated_frame, detections=detections, labels=labels)
mask_annotator = sv.MaskAnnotator()
annotated_frame = mask_annotator.annotate(scene=annotated_frame, detections=detections)
mask_image_save_path = os.path.join(output_dir, f"{frame_num:04d}_mask.jpg")
cv2.imwrite(mask_image_save_path, annotated_frame)
end_time3 = time.time()
fps3 = 1 / (end_time3 - start_time3)
print(f"FPS3 vis for frame {frame_num}: {fps3:.2f}")
et=time.time()
of = 1 / (et - ot)
print(f"OF Overall for frame {frame_num}: {of:.2f}")
print()
frame_num += 1
cap.release()
cv2.destroyAllWindows()
This is my code. This code is the modified version of https://github.com/IDEA-Research/Grounded-SAM-2/blob/main/grounded_sam2_local_demo.py
Do you have any suggestions or ideas on this issue?
TIA
Hi, I ran the grounding DINO inference code. And I am getting this error:
How can I solve this issue? TIA