GroundingDINO cannot work with MiniGPT4

pspdada commented 18 hours ago

System Info

transformers version: 4.46.2
Platform: Linux-5.15.0-120-generic-x86_64-with-glibc2.35
Python version: 3.12.4
Huggingface_hub version: 0.26.2
Safetensors version: 0.4.5
Accelerate version: not installed
Accelerate config: not found
PyTorch version (GPU?): 2.4.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using distributed or parallel set-up in script?: No
Using GPU in script?: 1
GPU type: NVIDIA A100-PCIE-40GB

Who can help?

@zucchini-nlp @amyeroberts

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

When I use the minigpt4 model from the repository ·https://github.com/Vision-CAIR/MiniGPT-4·, I find that grounding dino cannot be used together with it.

Specifically, when I import some necessary content from the minigpt4 repository into my project (without doing anything else about the minigpt4 repo) and use transformers grounding dino model, dino crashes the program directly at the model(**encoded_inputs) call with an error code of SIG(117), and no traceback or other information is provided.

Other models, such as flan-t5-base-VG-factual-sg, do not crash during their forward pass even when minigpt4 is imported.

After commenting out the four import lines related to minigpt4, there are no issues anymore.

import torch
from PIL import Image
from transformers import (
    GroundingDinoForObjectDetection,
    GroundingDinoProcessor,
)

# imports modules for registration
from minigpt4.datasets.builders import *  # noqa
from minigpt4.models import *  # noqa
from minigpt4.processors import *  # noqa
from minigpt4.tasks import *  # noqa

image_path = "/root/llm-project/LVLM/eval/Extended_CHAIR/images/chair-500/000000006763.jpg"
image: Image.Image = Image.open(image_path)
model: GroundingDinoForObjectDetection = (
    GroundingDinoForObjectDetection.from_pretrained(
        "IDEA-Research/grounding-dino-base",
        cache_dir="/root/llm-project/utils/models/hub",
        torch_dtype="auto",
        low_cpu_mem_usage=True,
    )
    .to("cuda")
    .eval()
)

processor: GroundingDinoProcessor = GroundingDinoProcessor.from_pretrained(
    "IDEA-Research/grounding-dino-base",
    cache_dir="/root/llm-project/utils/models/hub",
)

text = "man.umbrella.top hat."

with torch.inference_mode():
    encoded_inputs = processor(
        images=image,
        text=text,
        max_length=200,
        return_tensors="pt",
        padding=True,
        truncation=True,
    ).to("cuda")
    outputs = model(**encoded_inputs) # Crash here
    results = processor.post_process_grounded_object_detection(
        outputs=outputs,
        input_ids=encoded_inputs["input_ids"],
        box_threshold=0.25,
        text_threshold=0.25,
    )
    print(results)

Expected behavior

Since this issue is related to other repositories, I would like to ask if you can help resolve this problem? Or kindly just guide me on how to find the deeper cause? Combining multiple models is significant for my project, but this issue does not provide any traceback, leaving me without a starting point.

qubvel commented 16 hours ago

Hi @pspdada, thanks for reporting the issue! Does it work fine without any imports from minigpt4 in your env?

pspdada commented 10 hours ago

Hi @pspdada, thanks for reporting the issue! Does it work fine without any imports from minigpt4 in your env?

Yes

huggingface / transformers