ValueError: too many values to unpack (expected 2)

ykwongaq commented 4 months ago

Dear mmdetection team,

First of all, thank you so much for creating such a great work for model training. I really apperciated this.

I want to test the Grounding DINO and following the instruction here https://github.com/open-mmlab/mmdetection/blob/main/configs/mm_grounding_dino/usage.md

When I try to execute the inference code for close-set object detection:

python demo/image_demo.py images/animals.png \
        configs/mm_grounding_dino/grounding_dino_swin-t_pretrain_obj365.py \
        --weights grounding_dino_swin-t_pretrain_obj365_goldg_grit9m_v3det_20231204_095047-b448804b.pth \
        --texts '$: coco'

I got the following error:

ValueError: too many values to unpack (expected 2)

when executing bsz, scr_len = mask.size() at anaconda3/envs/openmmlab/lib/python3.8/site-packages/transformers/modeling_attn_mask_utils.py, line 173 in _expand_mask.

I checked that mask.size() is a 3-dimension value.

May I ask which part goes wrong?

Thank you.

Not sure will this be helpful or not but here is my environment:

sys.platform: linux
Python: 3.8.19 (default, Mar 20 2024, 19:58:24) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.7, V11.7.64
GCC: gcc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0
PyTorch: 2.3.0+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX512
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.18.0+cu121
OpenCV: 4.9.0
MMEngine: 0.10.4
MMDetection: 3.3.0+cfd5d3a

szlll commented 4 months ago

same erro. have you fix it?

ykwongaq commented 4 months ago

Nope, I guess I need to wait for the author to fix it.

szlll commented 4 months ago

@ykwongaq Hello, I have a solution here: In transformers/models/bert/modeling_bert.py, I replaced lines 1104-1106 with extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_shape) After that, the program runs fine and the results seem to be correct. But I'm not sure this is the right answer. Maybe you can try it.

droidlyx commented 4 months ago

I also have the same problem, thanks for your solution!

HamzaIbrarpy commented 4 months ago

same issue......issue occurs with all of the grounding-dino model that use Bert as a language model (all of them do i think).........problems seems to be in mmdet/models/language_models/bert.py.......generate_masks_with_special_tokens_and_transfer_map( tokenized, special_tokens_list): function

# generate attention mask and positional ids
attention_mask = (
    torch.eye(num_token,
              device=input_ids.device).bool().unsqueeze(0).repeat(
                  bs, 1, 1))
position_ids = torch.zeros((bs, num_token), device=input_ids.device)
previous_col = 0
for i in range(idxs.shape[0]):
    row, col = idxs[i]
    if (col == 0) or (col == num_token - 1):
        attention_mask[row, col, col] = True
        position_ids[row, col] = 0
    else:
        attention_mask[row, previous_col + 1:col + 1,
                       previous_col + 1:col + 1] = True
        position_ids[row, previous_col + 1:col + 1] = torch.arange(
            0, col - previous_col, device=input_ids.device)
    previous_col = col

return attention_mask, position_ids.to(torch.long)l

this generates a 3-D attention mask whereas File "/usr/local/lib/python3.10/dist-packages/transformers/models/bert/modeling_bert.py", line 1118, in forward extended_attention_mask = _prepare_4d_attention_mask_for_sdpa( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_attn_mask_utils.py", line 439, in _prepare_4d_attention_mask_for_sdpa batch_size, key_value_length = mask.shape ValueError: too many values to unpack (expected 2)............./transformers/modeling_attn_mask_utils.py _prepare_4d_attention_mask_for_sdpa expects a 2-D Tensor..........problem with version i guess plz help Thank you

77h2l commented 4 months ago

just try to lower the Transformers version.

HamzaIbrarpy commented 4 months ago

just try to lower the Transformers version.

any specific version i should downgrade to i am using 4.41.1

DoUntilFalse commented 3 months ago

just try to lower the Transformers version.

any specific version i should downgrade to i am using 4.41.1

4.38.0 works for me.

ykwongaq commented 3 months ago

@ykwongaq Hello, I have a solution here: In transformers/models/bert/modeling_bert.py, I replaced lines 1104-1106 with extended_attention_mask = self.get_extended_attention_mask(attention_mask, input_shape) After that, the program runs fine and the results seem to be correct. But I'm not sure this is the right answer. Maybe you can try it.

Great thanks! It work for me.

open-mmlab / mmdetection

ValueError: too many values to unpack (expected 2) #11750