rhysdg / vision-at-a-clip

Low-latency ONNX and TensorRT based zero-shot classification and detection with contrastive language-image pre-training based prompts
16 stars 1 forks source link

Runtime Exception Error #27

Closed Masrur02 closed 1 week ago

Masrur02 commented 2 weeks ago

Hi, I ran the grounding DINO inference code. And I am getting this error:

INFO:root:Available providers for ONNXRuntime: 
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 694483386
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:604] Reading dangerously large protocol message.  If the message turns out to be larger than 2147483647 bytes, parsing will be halted for security reasons.  To increase the limit (or to disable these warnings), see CodedInputStream::SetTotalBytesLimit() in google/protobuf/io/coded_stream.h.
[libprotobuf WARNING google/protobuf/io/coded_stream.cc:81] The total number of bytes read was 694483386
2024-08-24 14:39:07.422348500 [E:onnxruntime:, inference_session.cc:2105 operator()] Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:2220 SubGraphCollection_t onnxruntime::TensorrtExecutionProvider::GetSupportedList(SubGraphCollection_t, int, int, const onnxruntime::GraphViewer&, bool*) const [ONNXRuntimeError] : 1 : FAIL : TensorRT input: /backbone/backbone.0/Transpose_output_0 has no shape specified. Please run shape inference on the onnx model first. Details can be found in https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#shape-inference-for-tensorrt-subgraphs

Traceback (most recent call last):
  File "/home/ubuntu/Khan/Grounded-SAM-2/te.py", line 64, in <module>
    ogd = OnnxGDINO(type='gdino_fp32', device='cuda', trt=True)
  File "/home/ubuntu/Khan/Grounded-SAM-2/on/gdino/model.py", line 96, in __init__
    self.model = self._load_model(model_dir)
  File "/home/ubuntu/Khan/Grounded-SAM-2/on/gdino/model.py", line 156, in _load_model
    session = ort.InferenceSession(
  File "/home/ubuntu/miniconda3/envs/G/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/home/ubuntu/miniconda3/envs/G/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 491, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.cc:2220 SubGraphCollection_t onnxruntime::TensorrtExecutionProvider::GetSupportedList(SubGraphCollection_t, int, int, const onnxruntime::GraphViewer&, bool*) const [ONNXRuntimeError] : 1 : FAIL : TensorRT input: /backbone/backbone.0/Transpose_output_0 has no shape specified. Please run shape inference on the onnx model first. Details can be found in https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#shape-inference-for-tensorrt-subgraphs

How can I solve this issue? TIA

rhysdg commented 2 weeks ago

Hey there @Masrur02! Ah interesting - I just ran on my setup with CUDA 12.4 and TensorRT 10.0.1.6, onnxruntime-gpu 1.18.1 wihtout any issues.

Can you tell a bit more about your setup? Hardware accelerator, TensorRT, onnxruntime version, operating system etc

I'm working on a version that leverages torch.compile's tensorrt backend now too so fingers that combined with the tensorrt model optimizer library will get us some extended ops support!

Masrur02 commented 2 weeks ago

Hi, I am using CUDA 12.1 on a Ubuntu 22.04, my GPU is NVIDIA RTX 4090 and this is my setup

_libgcc_mutex             0.1                        main  
_openmp_mutex             5.1                       1_gnu  
bzip2                     1.0.8                h5eee18b_6  
ca-certificates           2024.7.2             h06a4308_0  
certifi                   2024.8.30                pypi_0    pypi
gast                      0.6.0                    pypi_0    pypi
gdown                     5.2.0                    pypi_0    pypi
gradio                    3.26.0                   pypi_0    pypi
gradio-client             0.1.2                    pypi_0    pypi
huggingface-hub           0.24.6                   pypi_0    pypi
idna                      3.8                      pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1  
libffi                    3.3                  he6710b0_2  
libgcc-ng                 11.2.0               h1234567_1  
libgomp                   11.2.0               h1234567_1  
libstdcxx-ng              11.2.0               h1234567_1  
libuuid                   1.41.5               h5eee18b_0  
matplotlib                3.9.2                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0  
numpy                     1.26.0                   pypi_0    pypi
nvidia-cudnn-cu12         8.9.2.26                 pypi_0    pypi
onnxruntime-extensions    0.12.0                   pypi_0    pypi
onnxruntime-gpu           1.18.0                   pypi_0    pypi
opencv-python-headless    4.10.0.84                pypi_0    pypi
openssl                   1.1.1w               h7f8727e_0  
pillow                    10.3.0                   pypi_0    pypi
pip                       24.2            py310h06a4308_0  
protobuf                  3.20.3                   pypi_0    pypi
pydantic                  2.8.2                    pypi_0    pypi
pydantic-core             2.20.1                   pypi_0    pypi
python                    3.10.0               h12debd9_5  
pytz                      2024.1                   pypi_0    pypi
readline                  8.2                  h5eee18b_0  
regex                     2024.5.15                pypi_0    pypi
scipy                     1.13.1                   pypi_0    pypi
setuptools                72.1.0          py310h06a4308_0  
sqlite                    3.45.3               h5eee18b_0  
tensorrt                  10.3.0                   pypi_0    pypi
tensorrt-cu12             10.3.0                   pypi_0    pypi
tensorrt-cu12-bindings    10.3.0                   pypi_0    pypi
tensorrt-cu12-libs        10.3.0                   pypi_0    pypi
tk                        8.6.14               h39e8969_0  
torch                     2.3.1                    pypi_0    pypi
torchvision               0.18.1                   pypi_0    pypi
triton                    2.3.1                    pypi_0    pypi
tzdata                    2024a                h04d1e81_0  
wheel                     0.43.0          py310h06a4308_0  
xz                        5.4.6                h5eee18b_1  
zlib                      1.2.13               h5eee18b_1 
Masrur02 commented 1 week ago

The interesting issue is if I use trt=False to avoid the tensorrt execution provider then the code works. However, if I use trt=True, then there is this error.

Masrur02 commented 1 week ago

Did you specify any shape while converting the onnx model? Or do you use any specific shape while running the inference using tensorrt execution provider?

Masrur02 commented 1 week ago

Hi, It seems the onnx model you converted is dynamic. There is no specified input shape. But the TensorRT execution provider experts specified shape. However, according to this, , we need to run the inference shape code. I have run it and can use the trt=True for the inferred onnx model. But the FPS has not been boosted so much. With your onnx model and trt=False the FPS is 4.64 in my PC. And with the inferred_onnx model and trt=True, the FPS is 5.60.

However, the visualization result is weird pred

Do you have any comments on it? Can you please suggest me something? And can you please provide your opinion that the boosted FPS is fine or it should be faster?

rhysdg commented 1 week ago

Hey there @Masrur02! That is an odd result indeed - I wonder the BERT separte ([SEP]) token is ending up at the predixtion visualisation too.

I'm seeing if I can reproduce now but to respond to your first couple of questions - forsure, as I was discussing over at the original repo, performance takes a bit of a hit with the TensorRT execution provider backend for onnxruntime due to some unsupported ops. I'm working on two things right now - pulling out BERT as it has full support for conversion, evaluating whether or not torch_tensorrt with the new Tensorrt model optimizer will make a difference.

My questions for you are - Are you running this code here ? Have you made any modifications? Are those latency benchmarks in seconds?

rhysdg commented 1 week ago

It's worth noting that I ran into problems with 'trt_fp16_enable': True also that looked similar to the bizarre inference above minus [sep]

rhysdg commented 1 week ago

Hey so here's a shared colab notebook so we're meeting in a middle - inference quality looks good but the speed with a T4 over my RTX 3080 leaves much to be desired

https://colab.research.google.com/drive/1Km1FzY1aeezu1G8GKNU1PZDeWmqvyUAO?usp=sharing

I'll debug the trt setup in colab shortly to check if we can reproduce that crazy inference

Masrur02 commented 1 week ago

Hey there @Masrur02! That is an odd result indeed - I wonder the BERT separte ([SEP]) token is ending up at the predixtion visualisation too.

I'm seeing if I can reproduce now but to respond to your first couple of questions - forsure, as I was discussing over at the original repo, performance takes a bit of a hit with the TensorRT execution provider backend for onnxruntime due to some unsupported ops. I'm working on two things right now - pulling out BERT as it has full support for conversion, evaluating whether or not torch_tensorrt with the new Tensorrt model optimizer will make a difference.

My questions for you are - Are you running this code here ? Have you made any modifications? Are those latency benchmarks in seconds?

Yes, I am running it here. However, I have modified the gdino.model.py to read the model weight from a local directory. As I had to convert your onnx model to an inferred_onnx model by using this https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/symbolic_shape_infer.py

rhysdg commented 1 week ago

Aha! I've managed to reproduce the shape inference issue you where originally receiving - My last update I accidentally sent up the raw conversion before running it through onnx simplifier - ran a before and after, and that resolves it, you shouldn't need to run it through the symbolic shape inference tool, it'll just be plug and play.

I'm sending up the working version now and a quick pr with the DL link - once that's done I'll give you a shout and let's see if that works out for you!

Thank you for the catch!

rhysdg commented 1 week ago

The results I'm getting are the following too: pred

Masrur02 commented 1 week ago

Aha! I've managed to reproduce the shape inference issue you where originally receiving - My last update I accidentally sent up the raw conversion before running it through onnx simplifier - ran a before and after, and that resolves it, you shouldn't need to run it through the symbolic shape inference tool, it'll just be plug and play.

I'm sending up the working version now and a quick pr with the DL link - once that's done I'll give you a shout and let's see if that works out for you!

Thank you for the catch!

Excellent. Please send me once you are done. it will be really helpful for me.

rhysdg commented 1 week ago

Aha! I've managed to reproduce the shape inference issue you where originally receiving - My last update I accidentally sent up the raw conversion before running it through onnx simplifier - ran a before and after, and that resolves it, you shouldn't need to run it through the symbolic shape inference tool, it'll just be plug and play. I'm sending up the working version now and a quick pr with the DL link - once that's done I'll give you a shout and let's see if that works out for you! Thank you for the catch!

Excellent. Please send me once you are done. it will be really helpful for me.

You've got it

Masrur02 commented 1 week ago

Where? Should I download it from your repo?

rhysdg commented 1 week ago

Where? Should I download it from your repo?

A bunch of pytests are just running on merge right now, once it's merged successfully just head to gdino/data delete you're current tensorrt cache, and any .onnx files you have - then just run the example I provided you. The correct version will download and begin converting to a tensorrt engine

Just noting once again that conversion isn't perfect with unsupported ops right now - I have a bunch of projects to complete right now but I'm eager to get back to it. I'll have a mixed precision fallback built inn with the torch model soon, and I'm hoping we can get the whole way with all the latest updates available to us

rhysdg commented 1 week ago

Alright merged to main and all yours @Masrur02 - checking for reproducability from scratch now

rhysdg commented 1 week ago

Cool, successful from scratch with correct inference - python packages are:

tensorrt==10.1.0 onnxruntime=1.18.1

In a Python 3.10 virtual env

Masrur02 commented 1 week ago

Hi, With the current setup, I get this error

INFO:root:Available providers for ONNXRuntime: EP Error EP Error /onnxruntime_src/onnxruntime/python/onnxruntime_pybind_state.cc:754 std::unique_ptr onnxruntime::python::CreateExecutionProviderInstance(const onnxruntime::SessionOptions&, const string&, const ProviderOptionsMap&) Invalid TensorRT EP option: trt_engine_hw_compatible when using [('TensorrtExecutionProvider', {'trt_engine_cache_enable': True, 'trt_max_workspace_size': 4294967296, 'trt_engine_cache_path': '/home/soicroot/Downloads/Khan/Grounded-SAM-2/on/gdino/data', 'trt_engine_hw_compatible': True, 'trt_sparsity_enable': True, 'trt_build_heuristics_enable': True, 'trt_builder_optimization_level': 0, 'trt_fp16_enable': True}), 'CUDAExecutionProvider', 'CPUExecutionProvider'] Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.

However, if I comment out trt_engine_hw_compatible, then the code is working well. Now I do not need to infer the shape. Thank you so much. Let me check the FPS now.

rhysdg commented 1 week ago

Ah good to know! Looks like deprecation between versions huh

Masrur02 commented 1 week ago

Yes probably. Anyways you have really done a great job.

rhysdg commented 1 week ago

Yes probably. Anyways you have really done a great job.

Ah I appreciate the kind word and wish luck for now! I'm closing this issue for now but feel free of course to open more and I'll be there as soon as I can. Have a good weekend!

Masrur02 commented 1 week ago

Hi, I was trying to use the Grounded SAM2 repo (https://github.com/IDEA-Research/Grounded-SAM-2). Here, the FPS for grounding DINO block was around 10 FPS and for the SAM2 was around 200 in my setup. When I am using your onnx for the DINO, the FPS for it is improving to 30 FPS, but now the FPS for SAM2 is limited to around 18, eventually the overall FPS is not improving. I have tried the torch.cuda.synchronize(). After using this int individual inference of SAM2 has increased. But the overall FPS is still lower.

import cv2
import torch
import numpy as np
import supervision as sv
from torchvision.ops import box_convert
from sam2.build_sam import build_sam2
from sam2.sam2_image_predictor import SAM2ImagePredictor
from grounding_dino.groundingdino.util.inference import load_model, load_image, predict
import time
import os
from on.gdino.model import OnnxGDINO
#from on.utils.gdino_utils import load_image2, viz

# Environment settings
# Use bfloat16 only where supported

# Build SAM2 image predictor
sam2_checkpoint = "./checkpoints/sam2_hiera_large.pt"
model_cfg = "sam2_hiera_l.yaml"
sam2_model = build_sam2(model_cfg, sam2_checkpoint, device="cuda")
sam2_predictor = SAM2ImagePredictor(sam2_model)

# Build Grounding DINO model
#model_id = "IDEA-Research/grounding-dino-tiny"
device = "cuda" if torch.cuda.is_available() else "cpu"
'''grounding_model = load_model(
    model_config_path="grounding_dino/groundingdino/config/GroundingDINO_SwinT_OGC.py", 
    model_checkpoint_path="gdino_checkpoints/groundingdino_swint_ogc.pth",
    device=device
)'''
ogd = OnnxGDINO(type='gdino_fp32', trt=True)

payload = ogd.preprocess_query("road. car")
# Setup the input text prompt for Grounding DINO
text = "road. car."
output_dir = "test"
os.makedirs(output_dir, exist_ok=True)

# Capture video
video_path = 'notebooks/videos/a.webm'
cap = cv2.VideoCapture(video_path)
cap.set(cv2.CAP_PROP_FPS, 5)
frame_num = 0

while cap.isOpened():
    ot = time.time()

    ret, frame = cap.read()
    if not ret:
        break
    #frame = cv2.resize(frame, (640, 480))

    #time.sleep(0.1)sam2_local_demo.py

    # Convert the frame to the required format for processing
    image_source, image, image_transformed = load_image(frame)
    #img,image_transformed=load_image2(frame)

    end_time0=time.time()
    fps0 = 1 / (end_time0 - ot)

    print(f"FPS Capturing for frame {frame_num}: {fps0:.2f}")

    start_time=time.time()
    boxes, predicted_phrases = ogd.inference(
        image_transformed.astype(np.float32),
        payload,
        text_threshold=0.25,
        box_threshold=0.35,
    )
    # torch.cuda.synchronize(device="cuda")

    end_time = time.time()
    fps = 1 / (end_time - start_time)

    print(f"FPS for DINO frame {frame_num}: {fps:.2f}")

    sam2_predictor.set_image(image_source)

    labels = [phrase.split('(')[0] for phrase in predicted_phrases]
    confidences = [float(phrase.split('(')[1][:-1]) for phrase in predicted_phrases]

    confidences= torch.tensor(confidences)

    # Process the box prompt for SAM2
    h, w, _ = frame.shape
    boxes = boxes * torch.Tensor([w, h, w, h])

    input_boxes = box_convert(boxes=boxes, in_fmt="cxcywh", out_fmt="xyxy").numpy()

    start_time2=time.time()
    # Enable mixed precision only for the specific block
    with torch.autocast(device_type="cuda", dtype=torch.bfloat16):
        if torch.cuda.get_device_properties(0).major >= 8:
            # Enable tfloat32 for Ampere GPUs
            torch.backends.cuda.matmul.allow_tf32 = True
            torch.backends.cudnn.allow_tf32 = True

        # Perform SAM2 prediction within the mixed precision context
        masks, scores, logits = sam2_predictor.predict(
            point_coords=None,
            point_labels=None,
            box=input_boxes,

            multimask_output=False,
        )
    # torch.cuda.synchronize()
    end_time2 = time.time()
    fps2 = 1 / (end_time2 - start_time2)

    print(f"FPS2 for SAM frame {frame_num}: {fps2:.2f}")

    #torch.cuda.synchronize()

    start_time3=time.time()

    # Post-process the output of the model to get the masks, scores, and logits for visualization
    if masks.ndim == 4:
        masks = masks.squeeze(1)

    confidences = confidences.numpy().tolist()
    class_names = labels
    class_ids = np.array(list(range(len(class_names))))

    labels = [
        f"{class_name} {confidence:.2f}"
        for class_name, confidence
        in zip(class_names, confidences)
    ]

    # Calculate FPS

    # Visualize image with supervision API
    detections = sv.Detections(
        xyxy=input_boxes,  # (n, 4)
        mask=masks.astype(bool),  # (n, h, w)
        class_id=class_ids
    )

    box_annotator = sv.BoxAnnotator()
    annotated_frame = box_annotator.annotate(scene=frame.copy(), detections=detections)

    label_annotator = sv.LabelAnnotator()
    annotated_frame = label_annotator.annotate(scene=annotated_frame, detections=detections, labels=labels)

    mask_annotator = sv.MaskAnnotator()
    annotated_frame = mask_annotator.annotate(scene=annotated_frame, detections=detections)

    mask_image_save_path = os.path.join(output_dir, f"{frame_num:04d}_mask.jpg")

    cv2.imwrite(mask_image_save_path, annotated_frame)
    end_time3 = time.time()
    fps3 = 1 / (end_time3 - start_time3)

    print(f"FPS3 vis for frame {frame_num}: {fps3:.2f}")
    et=time.time()
    of = 1 / (et - ot)

    print(f"OF Overall for frame {frame_num}: {of:.2f}")
    print()

    frame_num += 1

cap.release()
cv2.destroyAllWindows()

This is my code. This code is the modified version of https://github.com/IDEA-Research/Grounded-SAM-2/blob/main/grounded_sam2_local_demo.py

Do you have any suggestions or ideas on this issue?

TIA