Facing issue, 'The model and loaded state dict do not match exactly' while trying to run the example given in the README.md file

coscotuff commented 8 months ago

Describe the issue

I am currently trying to run Grounding DINO on Google Colab in order to understand more about how the model works and how I can use it. But when I try and run the given command, i.e,

!python demo/image_demo.py data/cat/images/IMG_20211205_120756.jpg configs/grounding_dino/grounding_dino_swin-t_finetune_8xb2_20e_cat.py --weights https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth --texts cat.

Any help would be great!

Reproduction In a cell in google colab, just paste the following code and run:

!git clone https://github.com/open-mmlab/mmdetection.git
%cd mmdetection
!pip install -r requirements/multimodal.txt
!wget https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth
!wget https://download.openmmlab.com/mmyolo/data/cat_dataset.zip
%pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu121/torch2.1.0/index.html
!mkdir data
!unzip cat_dataset.zip -d data/cat/
!pip install mmengine
!pip install mmdet
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
!python demo/image_demo.py data/cat/images/IMG_20211205_120756.jpg configs/grounding_dino/grounding_dino_swin-t_finetune_8xb2_20e_cat.py --weights https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth --texts cat.

Environment sys.platform: linux Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: Tesla T4 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 12.2, V12.2.140 GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 2.1.0+cu121 PyTorch compiling details: PyTorch built with:

GCC 9.3
C++ Version: 201703
Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
OpenMP 201511 (a.k.a. OpenMP 4.5)
LAPACK is enabled (usually provided by MKL)
NNPACK is enabled
CPU capability usage: AVX512
CUDA Runtime 12.1
NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
CuDNN 8.9.2
Magma 2.6.1
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.16.0+cu121 OpenCV: 4.8.0 MMEngine: 0.10.2 MMDetection: 3.3.0+44ebd17

Results I am getting the following output:

Loads checkpoint by http backend from path: https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth
Downloading: "https://download.openmmlab.com/mmdetection/v3.0/grounding_dino/groundingdino_swint_ogc_mmdet-822d7e9d.pth" to /root/.cache/torch/hub/checkpoints/groundingdino_swint_ogc_mmdet-822d7e9d.pth
100% 660M/660M [01:04<00:00, 10.7MB/s]
tokenizer_config.json: 100% 28.0/28.0 [00:00<00:00, 147kB/s]
config.json: 100% 570/570 [00:00<00:00, 3.18MB/s]
vocab.txt: 100% 232k/232k [00:00<00:00, 1.71MB/s]
tokenizer.json: 100% 466k/466k [00:00<00:00, 7.07MB/s]
model.safetensors: 100% 440M/440M [00:02<00:00, 205MB/s]
The model and loaded state dict do not match exactly

unexpected key in source state_dict: language_model.language_backbone.body.model.pooler.dense.weight, language_model.language_backbone.body.model.pooler.dense.bias, language_model.language_backbone.body.model.embeddings.position_ids

missing keys in source state_dict: bbox_head.cls_branches.0.log_scale, bbox_head.cls_branches.1.log_scale, bbox_head.cls_branches.2.log_scale, bbox_head.cls_branches.3.log_scale, bbox_head.cls_branches.4.log_scale, bbox_head.cls_branches.5.log_scale, bbox_head.cls_branches.6.log_scale, dn_query_generator.label_embedding.weight

/usr/local/lib/python3.10/dist-packages/mmdet/apis/det_inferencer.py:130: UserWarning: dataset_meta or class names are not saved in the checkpoint's meta data, use COCO classes by default.
  warnings.warn(
01/23 10:58:50 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "function" registry tree. As a workaround, the current "function" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
/usr/local/lib/python3.10/dist-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the `save_dir` argument.
  warnings.warn(f'Failed to add {vis_backend.__class__}, '
[nltk_data] Downloading package punkt to ~/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     ~/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
noun_phrases: ['cat']
/usr/local/lib/python3.10/dist-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an 
upcoming release, it will be required to pass the indexing argument. (Triggered internally at 
../aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/usr/local/lib/python3.10/dist-packages/mmcv/cnn/bricks/transformer.py:524: UserWarning: position 
encoding of key ismissing in MultiheadAttention.
  warnings.warn(f'position encoding of key is'
Inference ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   
results have been saved at outputs

hhaAndroid commented 8 months ago

@coscotuff You should use the configuration configs/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py.

vakker commented 6 months ago

@hhaAndroid there's still an issue with the position_ids, e.g. see:

$ python demo/image_demo.py image.jpg configs/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det.py --weights grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth --texts 'bench . car . person . orange . bicycle .'
Loads checkpoint by local backend from path: /home/user/mmdetection/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth
The model and loaded state dict do not match exactly

unexpected key in source state_dict: language_model.language_backbone.body.model.embeddings.position_ids

03/18 14:21:57 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "function" registry tree. As a workaround, the current "function" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
/home/user/.pyenv/versions/mmdet/lib/python3.10/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the `save_dir` argument.
  warnings.warn(f'Failed to add {vis_backend.__class__}, '
[nltk_data] Downloading package punkt to ~/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     ~/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
noun_phrases: ['bench', 'car', 'person', 'orange', 'bicycle']
/home/user/.pyenv/versions/mmdet/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/home/user/.pyenv/versions/mmdet/lib/python3.10/site-packages/mmcv/cnn/bricks/transformer.py:524: UserWarning: position encoding of key ismissing in MultiheadAttention.
  warnings.warn(f'position encoding of key is'

Is that correct?

gurkirt commented 6 months ago

@vakker did you solve it?

vakker commented 6 months ago

@gurkirt no, but I opened another issue #11583 as it seems to be a separate problem. It's actually unclear to me whether this is an actual issue at this point.

gurkirt commented 6 months ago

I guess position embeddings are fixed and computed on the fly, there is no need to store them. It might have been part of the parameters that's why it got stored in those models.

vakker commented 6 months ago

Maybe that's the case. Also note, that there's this warning during the forward pass: /home/user/.pyenv/versions/mmdet/lib/python3.10/site-packages/mmcv/cnn/bricks/transformer.py:524: UserWarning: position encoding of key ismissing in MultiheadAttention. It might be related?

open-mmlab / mmdetection

Facing issue, 'The model and loaded state dict do not match exactly' while trying to run the example given in the README.md file #11420