Loading GroundingDinoForObjectDetection error

garychan22 commented 1 month ago

System Info

transformers 4.44.0.dev0 python 3.8.19 torch 2.3.1+cu118

Who can help?

@amyeroberts

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

hi, I am trying grounded_sam implemented with transformers following https://github.com/NielsRogge/Transformers-Tutorials/blob/master/Grounding%20DINO/GroundingDINO_with_Segment_Anything.ipynb

however, when I am loading the model

from transformers import AutoModelForMaskGeneration, AutoProcessor, pipeline
device = "cuda"
detector_id = detector_id if detector_id is not None else "IDEA-Research/grounding-dino-tiny"
object_detector = pipeline(model=detector_id, task="zero-shot-object-detection", device=device)

some error are encountered

Some weights of the model checkpoint at models--IDEA-Research--grounding-dino-base were not used when initializing GroundingDinoForObjectDetection: ['model.decoder.layers.0.encoder_attn_text.in_proj_bias', 'mo
del.decoder.layers.0.encoder_attn_text.in_proj_weight', 'model.decoder.layers.0.self_attn.in_proj_bias', 'model.decoder.layers.0.self_attn.in_proj_weight', 'model.decoder.layers.1.encoder_attn_text.in_proj_bias', 'model.decoder.layers.1.encoder_att
n_text.in_proj_weight', 'model.decoder.layers.1.self_attn.in_proj_bias', 'model.decoder.layers.1.self_attn.in_proj_weight', 'model.decoder.layers.2.encoder_attn_text.in_proj_bias', 'model.decoder.layers.2.encoder_attn_text.in_proj_weight', 'model.d
ecoder.layers.2.self_attn.in_proj_bias', 'model.decoder.layers.2.self_attn.in_proj_weight', 'model.decoder.layers.3.encoder_attn_text.in_proj_bias', 'model.decoder.layers.3.encoder_attn_text.in_proj_weight', 'model.decoder.layers.3.self_attn.in_pro
j_bias', 'model.decoder.layers.3.self_attn.in_proj_weight', 'model.decoder.layers.4.encoder_attn_text.in_proj_bias', 'model.decoder.layers.4.encoder_attn_text.in_proj_weight', 'model.decoder.layers.4.self_attn.in_proj_bias', 'model.decoder.layers.4
.self_attn.in_proj_weight', 'model.decoder.layers.5.encoder_attn_text.in_proj_bias', 'model.decoder.layers.5.encoder_attn_text.in_proj_weight', 'model.decoder.layers.5.self_attn.in_proj_bias', 'model.decoder.layers.5.self_attn.in_proj_weight', 'mod
el.encoder.layers.0.fusion_layer.weight_l', 'model.encoder.layers.0.fusion_layer.weight_v', 'model.encoder.layers.0.text_enhancer_layer.self_attn.in_proj_bias', 'model.encoder.layers.0.text_enhancer_layer.self_attn.in_proj_weight', 'model.encoder.l
ayers.1.fusion_layer.weight_l', 'model.encoder.layers.1.fusion_layer.weight_v', 'model.encoder.layers.1.text_enhancer_layer.self_attn.in_proj_bias', 'model.encoder.layers.1.text_enhancer_layer.self_attn.in_proj_weight', 'model.encoder.layers.2.fusi
on_layer.weight_l', 'model.encoder.layers.2.fusion_layer.weight_v', 'model.encoder.layers.2.text_enhancer_layer.self_attn.in_proj_bias', 'model.encoder.layers.2.text_enhancer_layer.self_attn.in_proj_weight', 'model.encoder.layers.3.fusion_layer.wei
ght_l', 'model.encoder.layers.3.fusion_layer.weight_v', 'model.encoder.layers.3.text_enhancer_layer.self_attn.in_proj_bias', 'model.encoder.layers.3.text_enhancer_layer.self_attn.in_proj_weight', 'model.encoder.layers.4.fusion_layer.weight_l', 'mod
el.encoder.layers.4.fusion_layer.weight_v', 'model.encoder.layers.4.text_enhancer_layer.self_attn.in_proj_bias', 'model.encoder.layers.4.text_enhancer_layer.self_attn.in_proj_weight', 'model.encoder.layers.5.fusion_layer.weight_l', 'model.encoder.l
ayers.5.fusion_layer.weight_v', 'model.encoder.layers.5.text_enhancer_layer.self_attn.in_proj_bias', 'model.encoder.layers.5.text_enhancer_layer.self_attn.in_proj_weight', 'model.input_proj_text.bias', 'model.input_proj_text.weight', 'model.text_ba
ckbone.pooler.dense.bias', 'model.text_backbone.pooler.dense.weight']
- This IS expected if you are initializing GroundingDinoForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model)
.

Expected behavior

expect everything works well

amyeroberts commented 1 month ago

Hi @garychan22, thanks for opening this issue!

What is the value of detector_id being used (it's ambiguous from the snippet)? Running with "IDEA-Research/grounding-dino-tiny" on main I'm unable to replicate this error.

garychan22 commented 1 month ago

@amyeroberts hi, the detector_id is a local model path to the downloaded IDEA-Research/grounding-dino-tiny

amyeroberts commented 1 month ago

Could you try with just passing "IDEA-Research/grounding-dino-tiny" directly i.e:

from transformers import AutoModelForMaskGeneration, AutoProcessor, pipeline
device = "cuda"
detector_id = "IDEA-Research/grounding-dino-tiny"
object_detector = pipeline(model=detector_id, task="zero-shot-object-detection", device=device)

garychan22 commented 1 month ago

emmm... i tried and i think the reason i have manually downloaded the pretrained weights is the inner server network issue

requests.exceptions.ConnectionError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443)

and i set the HF_HOME and TRANSFORMERS_CACHE to the parent dir of the local model path, rename the model dir to

models--IDEA-Research--grounding-dino-tiny

it seems that transformers cannot find the path.

amyeroberts commented 1 month ago

and i set the HF_HOME and TRANSFORMERS_CACHE to the parent dir of the local model path, rename the model dir to

There's no need to do this, and I believe this might be the issue. The models stored in the cache have a specific folder structure which keeps track of which objects correspond to which commits on the repo s.t. if one updates the config on the hub, only the new config is downloaded.

Instead, if you've downloaded the folder directly you can just pass in that local folder

from transformers import AutoModelForMaskGeneration, AutoProcessor, pipeline
device = "cuda"
detector_id = "path/to/model/folder"
object_detector = pipeline(model=detector_id, task="zero-shot-object-detection", device=device)

qubvel commented 1 month ago

I can reproduce the issue, but with device_map provided instead of device.

from transformers import pipeline

object_detector = pipeline(
    model="IDEA-Research/grounding-dino-tiny", task="zero-shot-object-detection", device_map="cuda"
)

The issue appears because tied_params are identified differently depending on whether device_map is specified. I created an issue in Accelerate regarding this to clarify the cause. https://github.com/huggingface/accelerate/issues/2984

huggingface / transformers