salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.93k stars 973 forks source link

Error in beam-search multinomial sampling #78

Open Richar-Du opened 1 year ago

Richar-Du commented 1 year ago

According to the transformer in Huggingface, beam-search multinomial sampling can be implemented by setting num_beams>1 and do_sample=True. However, this is not supported in LAVIS. If I set num_beams=4, num_return_sequences=4 and do_sample=True simultaneously, there is an error as follows:

File "MM/LAVIS/lavis/models/med.py", line 1405, in generate_from_encoder
    outputs = self.generate(
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
    return func(*args, **kwargs)
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/transformers/generation_utils.py", line 1404, in generate
    return self.beam_sample(
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/transformers/generation_utils.py", line 2520, in beam_sample
    outputs = self(
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "MM/LAVIS/lavis/models/med.py", line 1211, in forward
    outputs = self.bert(
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "MM/LAVIS/lavis/models/med.py", line 974, in forward
    encoder_outputs = self.encoder(
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "MM/LAVIS/lavis/models/med.py", line 592, in forward
    layer_outputs = layer_module(
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "MM/LAVIS/lavis/models/med.py", line 475, in forward
    cross_attention_outputs = self.crossattention(
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "MM/LAVIS/lavis/models/med.py", line 346, in forward
    self_outputs = self.self(
  File "miniconda3/envs/lavis/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "MM/LAVIS/lavis/models/med.py", line 219, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: The size of tensor a (36) must match the size of tensor b (4) at non-singleton dimension 0

During generation, the size is normal when generating the first token, both query_layer and key_layer are torch.Size([64, 12, 5, 64]). However, when generating the second token, the size of key_layer become torch.Size([4, 12, 577, 64]). So I think there may be something wrong with the image caption. By the way, 5 is my prompt_length and 12 is the attention head.

Could you figure out where the error is? Thanks in advance :)

dxli94 commented 1 year ago

Hi, @Richar-Du,

I think something fishy might be going on. I will investigate into this. Thanks for raising this.

nullhty commented 1 year ago

I have meet the same trouble because my version of transformers is incorrect. Maybe you need to check whether your version of transformers is lower than 4.27