salesforce / LAVIS

LAVIS - A One-stop Library for Language-Vision Intelligence
BSD 3-Clause "New" or "Revised" License
9.86k stars 967 forks source link

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list. Problem with torch.cat #412

Open jameswan opened 1 year ago

jameswan commented 1 year ago

│ 56 image = vis_processors"eval".unsqueeze(0).to(device) │ │ 57 │ │ 58 # generate caption using beam search │ │ ❱ 59 model.generate({"image": image}) │ │ 60 # model.generate({"image": image, "prompt": "Describe this image. Answer:"}) │ │ 61 │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\torch\utils_contextlib.py:115 in │ │ decorate_context │ │ │ │ 112 │ @functools.wraps(func) │ │ 113 │ def decorate_context(*args, kwargs): │ │ 114 │ │ with ctx_factory(): │ │ ❱ 115 │ │ │ return func(*args, *kwargs) │ │ 116 │ │ │ 117 │ return decorate_context │ │ 118 │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\lavis\models\blip2_models\blip2_opt.py:2 │ │ 20 in generate │ │ │ │ 217 │ │ │ else: │ │ 218 │ │ │ │ query_embeds = inputs_opt.repeat_interleave(num_beams, dim=0) │ │ 219 │ │ │ │ │ ❱ 220 │ │ │ outputs = self.opt_model.generate( │ │ 221 │ │ │ │ input_ids=input_ids, │ │ 222 │ │ │ │ query_embeds=query_embeds, │ │ 223 │ │ │ │ attention_mask=attention_mask, │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\torch\utils_contextlib.py:115 in │ │ decorate_context │ │ │ │ 112 │ @functools.wraps(func) │ │ 113 │ def decorate_context(args, kwargs): │ │ 114 │ │ with ctx_factory(): │ │ ❱ 115 │ │ │ return func(args, kwargs) │ │ 116 │ │ │ 117 │ return decorate_context │ │ 118 │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\transformers\generation\utils.py:1627 in │ │ generate │ │ │ │ 1624 │ │ │ │ model_kwargs, │ │ 1625 │ │ │ ) │ │ 1626 │ │ │ # 13. run beam search │ │ ❱ 1627 │ │ │ return self.beam_search( │ │ 1628 │ │ │ │ input_ids, │ │ 1629 │ │ │ │ beam_scorer, │ │ 1630 │ │ │ │ logits_processor=logits_processor, │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\transformers\generation\utils.py:2932 in │ │ beam_search │ │ │ │ 2929 │ │ │ │ │ 2930 │ │ │ model_inputs = self.prepare_inputs_for_generation(input_ids, model_kwargs) │ │ 2931 │ │ │ │ │ ❱ 2932 │ │ │ outputs = self( │ │ 2933 │ │ │ │ model_inputs, │ │ 2934 │ │ │ │ return_dict=True, │ │ 2935 │ │ │ │ output_attentions=output_attentions, │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\torch\nn\modules\module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(args, *kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\lavis\models\blip2_models\modeling_opt.p │ │ y:1037 in forward │ │ │ │ 1034 │ │ ) │ │ 1035 │ │ │ │ 1036 │ │ # decoder outputs consists of (dec_features, layer_state, dec_hidden, dec_attn) │ │ ❱ 1037 │ │ outputs = self.model.decoder( │ │ 1038 │ │ │ input_ids=input_ids, │ │ 1039 │ │ │ attention_mask=attention_mask, │ │ 1040 │ │ │ head_mask=head_mask, │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\torch\nn\modules\module.py:1501 in │ │ _call_impl │ │ │ │ 1498 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │ │ 1499 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │ │ 1500 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1501 │ │ │ return forward_call(args, **kwargs) │ │ 1502 │ │ # Do not call functions when jit is used │ │ 1503 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1504 │ │ backward_pre_hooks = [] │ │ │ │ C:\Users\James\anaconda3\envs\new-env\lib\site-packages\lavis\models\blip2_models\modeling_opt.p │ │ y:703 in forward │ │ │ │ 700 │ │ │ inputs_embeds = self.embed_tokens(input_ids) │ │ 701 │ │ │ │ 702 │ │ if query_embeds is not None: │ │ ❱ 703 │ │ │ inputs_embeds = torch.cat([query_embeds, inputs_embeds], dim=1) │ │ 704 │ │ │ input_shape = inputs_embeds.size()[:-1] │ │ 705 │ │ │ │ 706 │ │ # embed positions │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 25 but got size 5 for tensor number 1 in the list.

naokiyokoyama commented 1 year ago

Try transformers-4.26.1, that fixed it for me.

Balladie commented 1 year ago

I resolved it by reinstalling LAVIS from source. May need update on PyPI version of it to work with latest version of transformers

pip uninstall salesforce-lavis
pip install git+https://github.com/salesforce/LAVIS.git