kyegomez / ScreenAI

Implementation of the ScreenAI model from the paper: "A Vision-Language Model for UI and Infographics Understanding"
https://discord.gg/GYbXvDGevY
MIT License
247 stars 26 forks source link

runtime error when executing the default example #1

Open NuiMrme opened 5 months ago

NuiMrme commented 5 months ago

Describe the bug after pip install screenai a runtime error is produced in the from screenai.main import ScreenAI line in the default example : RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x4 and 512x512)

To Reproduce Steps to reproduce the behavior:

  1. run pip install screenai
  2. run the default example

Expected behavior run without error

Screenshots `--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_20976\3292023021.py in <cell line: 2>() 1 import torch ----> 2 from screenai.main import ScreenAI 3 4 # Create a tensor for the image 5 image = torch.rand(1, 3, 224, 224)

~\AppData\Local\Programs\Python\Python39\lib\site-packages\screenai__init__.py in ----> 1 from screenai.main import ( 2 CrossAttention, 3 MultiModalEncoder, 4 MultiModalDecoder, 5 ScreenAI,

~\AppData\Local\Programs\Python\Python39\lib\site-packages\screenai\main.py in 5 from torch import Tensor, einsum, nn 6 from torch.autograd import Function ----> 7 from zeta.nn import ( 8 SwiGLU, 9 FeedForward,

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta__init__.py in 26 logger.addFilter(f) 27 ---> 28 from zeta.nn import 29 from zeta.models import 30 from zeta.utils import *

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn__init__.py in 1 from zeta.nn.attention import 2 from zeta.nn.embeddings import ----> 3 from zeta.nn.modules import 4 from zeta.nn.biases import

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules__init__.py in 45 from zeta.nn.modules.s4 import s4d_kernel 46 from zeta.nn.modules.h3 import H3Layer ---> 47 from zeta.nn.modules.mlp_mixer import MLPMixer 48 from zeta.nn.modules.leaky_relu import LeakyRELU 49 from zeta.nn.modules.adaptive_layernorm import AdaptiveLayerNorm

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules\mlp_mixer.py in 143 1, 512, 32, 32 144 ) # Batch size of 1, 512 channels, 32x32 image --> 145 output = mlp_mixer(example_input) 146 print( 147 output.shape

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules\mlp_mixer.py in forward(self, x) 123 x = rearrange(x, "n c h w -> n (h w) c") 124 for mixer_block in self.mixer_blocks: --> 125 x = mixer_block(x) 126 x = self.pred_head_layernorm(x) 127 x = x.mean(dim=1)

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules\mlp_mixer.py in forward(self, x) 61 y = self.norm1(x) 62 y = rearrange(y, "n c t -> n t c") ---> 63 y = self.tokens_mlp(y) 64 y = rearrange(y, "n t c -> n c t") 65 x = x + y

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:

~\AppData\Local\Programs\Python\Python39\lib\site-packages\zeta\nn\modules\mlp_mixer.py in forward(self, x) 28 torch.Tensor: description 29 """ ---> 30 y = self.dense1(x) 31 y = F.gelu(y) 32 return self.dense2(y)

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _wrapped_call_impl(self, *args, kwargs) 1516 return self._compiled_call_impl(*args, *kwargs) # type: ignore[misc] 1517 else: -> 1518 return self._call_impl(args, kwargs) 1519 1520 def _call_impl(self, *args, **kwargs):

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *args, *kwargs) 1525 or _global_backward_pre_hooks or _global_backward_hooks 1526 or _global_forward_hooks or _global_forward_pre_hooks): -> 1527 return forward_call(args, **kwargs) 1528 1529 try:

~\AppData\Local\Programs\Python\Python39\lib\site-packages\torch\nn\modules\linear.py in forward(self, input) 112 113 def forward(self, input: Tensor) -> Tensor: --> 114 return F.linear(input, self.weight, self.bias) 115 116 def extra_repr(self) -> str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (512x4 and 512x512)`

Upvote & Fund

Fund with Polar

github-actions[bot] commented 5 months ago

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

DevChrisRoth commented 5 months ago

Got that same issue on a Mac M1

emarashliev commented 5 months ago

Same here Intel Mac

carlitose commented 5 months ago

Same with mac M2

zhaixiaowai commented 4 months ago

Same with windows11&wsl

Yingrjimsch commented 4 months ago

same here with windows10

Edit: I solved this issue by pip uninstall zetascale and reinstall with pip install zetascale In my case it installed an ancient version 0.9.xyz and after I installed the newest version 2.2.7 it worked

@kyegomez maybe it would be good to update the README example with the actual example from the example.py after solving this issue I got more issue because

  1. there was no num_tokens defined
  2. there was no max_seq_len defined
  3. image and text were not initialized with the right dimensions

Another question I've got is, how did you choose num_tokens and max_seq_len?

github-actions[bot] commented 2 months ago

Stale issue message

MElmardi commented 1 month ago

Same with Linux Ubuntu 24 LTS

RokiRan commented 1 month ago

After my modifications, I got a working code, and I hope it solves your problem.

import torch
from screenai.main import ScreenAI

# 创建图像张量
image = torch.rand(1, 3, 224, 224)

# 创建 ScreenAI 模型的实例
model = ScreenAI(
    num_tokens=2000,
    max_seq_len=1024,
    patch_size=16,
    image_size=224,
    dim=512,
    depth=6,
    heads=8,
    vit_depth=4,
    multi_modal_encoder_depth=4,
    llm_decoder_depth=4,
    mm_encoder_ff_mult=4,
)

# 假设您的文本已经被转换为词索引,这里我们使用随机整数来模拟
# num_tokens 是您的词汇表大小,max_seq_len 是模型能够处理的最大序列长度
text_indices = torch.randint(0, model.num_tokens, (1, model.max_seq_len))

# 将文本索引张量转换为长整型张量
text = text_indices.long()

# 使用给定的文本和图像张量进行模型的正向传播
out = model(text, image)

# 打印输出张量的形状
print(out)