Closed coscotuff closed 7 months ago
@coscotuff You should use the configuration configs/grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_cap4m.py
.
@hhaAndroid there's still an issue with the position_ids
, e.g. see:
$ python demo/image_demo.py image.jpg configs/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det.py --weights grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth --texts 'bench . car . person . orange . bicycle .'
Loads checkpoint by local backend from path: /home/user/mmdetection/grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth
The model and loaded state dict do not match exactly
unexpected key in source state_dict: language_model.language_backbone.body.model.embeddings.position_ids
03/18 14:21:57 - mmengine - WARNING - Failed to search registry with scope "mmdet" in the "function" registry tree. As a workaround, the current "function" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmdet" is a correct scope, or whether the registry is initialized.
/home/user/.pyenv/versions/mmdet/lib/python3.10/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the `save_dir` argument.
warnings.warn(f'Failed to add {vis_backend.__class__}, '
[nltk_data] Downloading package punkt to ~/nltk_data...
[nltk_data] Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data] ~/nltk_data...
[nltk_data] Package averaged_perceptron_tagger is already up-to-
[nltk_data] date!
noun_phrases: ['bench', 'car', 'person', 'orange', 'bicycle']
/home/user/.pyenv/versions/mmdet/lib/python3.10/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3526.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
/home/user/.pyenv/versions/mmdet/lib/python3.10/site-packages/mmcv/cnn/bricks/transformer.py:524: UserWarning: position encoding of key ismissing in MultiheadAttention.
warnings.warn(f'position encoding of key is'
Is that correct?
@vakker did you solve it?
@gurkirt no, but I opened another issue #11583 as it seems to be a separate problem. It's actually unclear to me whether this is an actual issue at this point.
I guess position embeddings are fixed and computed on the fly, there is no need to store them. It might have been part of the parameters that's why it got stored in those models.
Maybe that's the case.
Also note, that there's this warning during the forward pass: /home/user/.pyenv/versions/mmdet/lib/python3.10/site-packages/mmcv/cnn/bricks/transformer.py:524: UserWarning: position encoding of key ismissing in MultiheadAttention.
It might be related?
Describe the issue
I am currently trying to run Grounding DINO on Google Colab in order to understand more about how the model works and how I can use it. But when I try and run the given command, i.e,
Any help would be great!
Reproduction In a cell in google colab, just paste the following code and run:
Environment sys.platform: linux Python: 3.10.12 (main, Nov 20 2023, 15:14:05) [GCC 11.4.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0: Tesla T4 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 12.2, V12.2.140 GCC: x86_64-linux-gnu-gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 PyTorch: 2.1.0+cu121 PyTorch compiling details: PyTorch built with:
TorchVision: 0.16.0+cu121 OpenCV: 4.8.0 MMEngine: 0.10.2 MMDetection: 3.3.0+44ebd17
Results I am getting the following output: