Open gunesevitan opened 1 year ago
I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:168 │ │ in <module> │ │ │ │ 165 │ for step, inputs in enumerate(progress_bar): │ │ 166 │ │ │ │ 167 │ │ inputs = inputs.to(device) │ │ ❱ 168 │ │ batch_predictions = predict_blip( │ │ 169 │ │ │ inputs=inputs, │ │ 170 │ │ │ model=blip_model, │ │ 171 │ │ │ nucleus_sampling=False, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:92 in │ │ predict_blip │ │ │ │ 89 │ """ │ │ 90 │ │ │ 91 │ with torch.no_grad(), torch.autocast(device_type=device.type, dtype=torch.float16): │ │ ❱ 92 │ │ outputs = model.generate( │ │ 93 │ │ │ samples={'image': inputs}, │ │ 94 │ │ │ use_nucleus_sampling=nucleus_sampling, │ │ 95 │ │ │ num_beams=num_beams, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/blip_models/blip_caption.py:188 in generate │ │ │ │ 185 │ │ prompt.input_ids = prompt.input_ids[:, :-1] │ │ 186 │ │ │ │ 187 │ │ # get decoded text │ │ ❱ 188 │ │ decoder_out = self.text_decoder.generate_from_encoder( │ │ 189 │ │ │ tokenized_prompt=prompt, │ │ 190 │ │ │ visual_embeds=image_embeds, │ │ 191 │ │ │ sep_token_id=self.tokenizer.sep_token_id, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:1363 in generate_from_encoder │ │ │ │ 1360 │ │ │ ) │ │ 1361 │ │ else: │ │ 1362 │ │ │ # beam search │ │ ❱ 1363 │ │ │ outputs = self.generate( │ │ 1364 │ │ │ │ input_ids=tokenized_prompt.input_ids, │ │ 1365 │ │ │ │ max_length=max_length, │ │ 1366 │ │ │ │ min_length=min_length, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/autograd/grad_mode.py:27 in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, **kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(*args, **kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/transformers/generation/utils.py:1490 in generate │ │ │ │ 1487 │ │ │ │ **model_kwargs, │ │ 1488 │ │ │ ) │ │ 1489 │ │ │ # 13. run beam search │ │ ❱ 1490 │ │ │ return self.beam_search( │ │ 1491 │ │ │ │ input_ids, │ │ 1492 │ │ │ │ beam_scorer, │ │ 1493 │ │ │ │ logits_processor=logits_processor, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/transformers/generation/utils.py:2749 in beam_search │ │ │ │ 2746 │ │ │ │ │ 2747 │ │ │ model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) │ │ 2748 │ │ │ │ │ ❱ 2749 │ │ │ outputs = self( │ │ 2750 │ │ │ │ **model_inputs, │ │ 2751 │ │ │ │ return_dict=True, │ │ 2752 │ │ │ │ output_attentions=output_attentions, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:1213 in forward │ │ │ │ 1210 │ │ if labels is not None: │ │ 1211 │ │ │ use_cache = False │ │ 1212 │ │ │ │ ❱ 1213 │ │ outputs = self.bert( │ │ 1214 │ │ │ input_ids, │ │ 1215 │ │ │ attention_mask=attention_mask, │ │ 1216 │ │ │ position_ids=position_ids, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:977 in forward │ │ │ │ 974 │ │ else: │ │ 975 │ │ │ embedding_output = encoder_embeds │ │ 976 │ │ │ │ ❱ 977 │ │ encoder_outputs = self.encoder( │ │ 978 │ │ │ embedding_output, │ │ 979 │ │ │ attention_mask=extended_attention_mask, │ │ 980 │ │ │ head_mask=head_mask, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:595 in forward │ │ │ │ 592 │ │ │ │ │ mode=mode, │ │ 593 │ │ │ │ ) │ │ 594 │ │ │ else: │ │ ❱ 595 │ │ │ │ layer_outputs = layer_module( │ │ 596 │ │ │ │ │ hidden_states, │ │ 597 │ │ │ │ │ attention_mask, │ │ 598 │ │ │ │ │ layer_head_mask, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:478 in forward │ │ │ │ 475 │ │ │ │ outputs = outputs + cross_attention_outputs[1:-1] │ │ 476 │ │ │ │ │ 477 │ │ │ else: │ │ ❱ 478 │ │ │ │ cross_attention_outputs = self.crossattention( │ │ 479 │ │ │ │ │ attention_output, │ │ 480 │ │ │ │ │ attention_mask, │ │ 481 │ │ │ │ │ head_mask, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:349 in forward │ │ │ │ 346 │ │ past_key_value=None, │ │ 347 │ │ output_attentions=False, │ │ 348 │ ): │ │ ❱ 349 │ │ self_outputs = self.self( │ │ 350 │ │ │ hidden_states, │ │ 351 │ │ │ attention_mask, │ │ 352 │ │ │ head_mask, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:222 in forward │ │ │ │ 219 │ │ print('query', query_layer.shape) │ │ 220 │ │ print('key', key_layer.shape) │ │ 221 │ │ print('key t', key_layer.transpose(-1, -2).shape) │ │ ❱ 222 │ │ attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) │ │ 223 │ │ │ │ 224 │ │ if ( │ │ 225 │ │ │ self.position_embedding_type == "relative_key" │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: The size of tensor a (48) must match the size of tensor b (144) at non-singleton dimension 0
I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?
yes, i just ask the same question yesterday, we need to downgrade the version of transformer..
you can see that requirement.txt have constraint the version of transformer package transformers>=4.25.0,<4.27
so it should less then 4.27!
at least 4.25 will work (i take this version)
I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:168 │ │ in <module> │ │ │ │ 165 │ for step, inputs in enumerate(progress_bar): │ │ 166 │ │ │ │ 167 │ │ inputs = inputs.to(device) │ │ ❱ 168 │ │ batch_predictions = predict_blip( │ │ 169 │ │ │ inputs=inputs, │ │ 170 │ │ │ model=blip_model, │ │ 171 │ │ │ nucleus_sampling=False, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/src/image_captioning/blip_.py:92 in │ │ predict_blip │ │ │ │ 89 │ """ │ │ 90 │ │ │ 91 │ with torch.no_grad(), torch.autocast(device_type=device.type, dtype=torch.float16): │ │ ❱ 92 │ │ outputs = model.generate( │ │ 93 │ │ │ samples={'image': inputs}, │ │ 94 │ │ │ use_nucleus_sampling=nucleus_sampling, │ │ 95 │ │ │ num_beams=num_beams, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/blip_models/blip_caption.py:188 in generate │ │ │ │ 185 │ │ prompt.input_ids = prompt.input_ids[:, :-1] │ │ 186 │ │ │ │ 187 │ │ # get decoded text │ │ ❱ 188 │ │ decoder_out = self.text_decoder.generate_from_encoder( │ │ 189 │ │ │ tokenized_prompt=prompt, │ │ 190 │ │ │ visual_embeds=image_embeds, │ │ 191 │ │ │ sep_token_id=self.tokenizer.sep_token_id, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:1363 in generate_from_encoder │ │ │ │ 1360 │ │ │ ) │ │ 1361 │ │ else: │ │ 1362 │ │ │ # beam search │ │ ❱ 1363 │ │ │ outputs = self.generate( │ │ 1364 │ │ │ │ input_ids=tokenized_prompt.input_ids, │ │ 1365 │ │ │ │ max_length=max_length, │ │ 1366 │ │ │ │ min_length=min_length, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/autograd/grad_mode.py:27 in decorate_context │ │ │ │ 24 │ │ @functools.wraps(func) │ │ 25 │ │ def decorate_context(*args, **kwargs): │ │ 26 │ │ │ with self.clone(): │ │ ❱ 27 │ │ │ │ return func(*args, **kwargs) │ │ 28 │ │ return cast(F, decorate_context) │ │ 29 │ │ │ 30 │ def _wrap_generator(self, func): │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/transformers/generation/utils.py:1490 in generate │ │ │ │ 1487 │ │ │ │ **model_kwargs, │ │ 1488 │ │ │ ) │ │ 1489 │ │ │ # 13. run beam search │ │ ❱ 1490 │ │ │ return self.beam_search( │ │ 1491 │ │ │ │ input_ids, │ │ 1492 │ │ │ │ beam_scorer, │ │ 1493 │ │ │ │ logits_processor=logits_processor, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/transformers/generation/utils.py:2749 in beam_search │ │ │ │ 2746 │ │ │ │ │ 2747 │ │ │ model_inputs = self.prepare_inputs_for_generation(input_ids, **model_kwargs) │ │ 2748 │ │ │ │ │ ❱ 2749 │ │ │ outputs = self( │ │ 2750 │ │ │ │ **model_inputs, │ │ 2751 │ │ │ │ return_dict=True, │ │ 2752 │ │ │ │ output_attentions=output_attentions, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:1213 in forward │ │ │ │ 1210 │ │ if labels is not None: │ │ 1211 │ │ │ use_cache = False │ │ 1212 │ │ │ │ ❱ 1213 │ │ outputs = self.bert( │ │ 1214 │ │ │ input_ids, │ │ 1215 │ │ │ attention_mask=attention_mask, │ │ 1216 │ │ │ position_ids=position_ids, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:977 in forward │ │ │ │ 974 │ │ else: │ │ 975 │ │ │ embedding_output = encoder_embeds │ │ 976 │ │ │ │ ❱ 977 │ │ encoder_outputs = self.encoder( │ │ 978 │ │ │ embedding_output, │ │ 979 │ │ │ attention_mask=extended_attention_mask, │ │ 980 │ │ │ head_mask=head_mask, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:595 in forward │ │ │ │ 592 │ │ │ │ │ mode=mode, │ │ 593 │ │ │ │ ) │ │ 594 │ │ │ else: │ │ ❱ 595 │ │ │ │ layer_outputs = layer_module( │ │ 596 │ │ │ │ │ hidden_states, │ │ 597 │ │ │ │ │ attention_mask, │ │ 598 │ │ │ │ │ layer_head_mask, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:478 in forward │ │ │ │ 475 │ │ │ │ outputs = outputs + cross_attention_outputs[1:-1] │ │ 476 │ │ │ │ │ 477 │ │ │ else: │ │ ❱ 478 │ │ │ │ cross_attention_outputs = self.crossattention( │ │ 479 │ │ │ │ │ attention_output, │ │ 480 │ │ │ │ │ attention_mask, │ │ 481 │ │ │ │ │ head_mask, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:349 in forward │ │ │ │ 346 │ │ past_key_value=None, │ │ 347 │ │ output_attentions=False, │ │ 348 │ ): │ │ ❱ 349 │ │ self_outputs = self.self( │ │ 350 │ │ │ hidden_states, │ │ 351 │ │ │ attention_mask, │ │ 352 │ │ │ head_mask, │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/torch/nn/modules/module.py:1194 in _call_impl │ │ │ │ 1191 │ │ # this function, and just call forward. │ │ 1192 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │ │ 1193 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │ │ ❱ 1194 │ │ │ return forward_call(*input, **kwargs) │ │ 1195 │ │ # Do not call functions when jit is used │ │ 1196 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │ │ 1197 │ │ if self._backward_hooks or _global_backward_hooks: │ │ │ │ /home/gunes/Desktop/Kaggle/stable-diffusion-image-to-prompts/venv_competition/lib/python3.9/site │ │ -packages/lavis/models/med.py:222 in forward │ │ │ │ 219 │ │ print('query', query_layer.shape) │ │ 220 │ │ print('key', key_layer.shape) │ │ 221 │ │ print('key t', key_layer.transpose(-1, -2).shape) │ │ ❱ 222 │ │ attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2)) │ │ 223 │ │ │ │ 224 │ │ if ( │ │ 225 │ │ │ self.position_embedding_type == "relative_key" │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ RuntimeError: The size of tensor a (48) must match the size of tensor b (144) at non-singleton dimension 0
I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?
yes, i just ask the same question yesterday, we need to downgrade the version of transformer.. you can see that requirement.txt have constraint the version of transformer package
transformers>=4.25.0,<4.27
so it should less then 4.27!at least 4.25 will work (i take this version)
Yeah, I figured that out but I have to use transformers 4.27 :/
We have made an update to BLIP-2 OPT models so that they can work with the latest transformers with version>=4.27.
We have made an update to BLIP-2 OPT models so that they can work with the latest transformers with version>=4.27.
Does BLIP model work with transformers>=4.27 too?
BLIP model does not work with transformers>=4.27.
BLIP model does not work with transformers>=4.27.
May I know the reason why BLIP doesn't work with transformers>=4.27? I have to use transformers>4.27, is it possible that I modify transformers>4.27 locally to fit BLIP model? Thank you in advance.
I have to use transformers 4.27 because latest version of clip-interrogator requires that specific version. After upgrading transformers from 4.26 to 4.27, I had this issue.
I'm not sure if the first dimension 144 is correct here. What's happening in transformers 4.27 causing this?