zhongpei / Comfyui_image2prompt

image to prompt by vikhyatk/moondream1
GNU General Public License v3.0
287 stars 19 forks source link

Moondream Error #17

Closed Pauweltje closed 6 months ago

Pauweltje commented 7 months ago

Error occurred when executing Image2Text:

The expanded size of the tensor (807) must match the existing size (808) at non-singleton dimension 1. Target sizes: [1, 807]. Tensor sizes: [1, 808]

File "c:\ComfyUI_windows_portable\ComfyUI\execution.py", line 152, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "c:\ComfyUI_windows_portable\ComfyUI\execution.py", line 82, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "c:\ComfyUI_windows_portable\ComfyUI\execution.py", line 75, in map_node_over_list results.append(getattr(obj, func)(slice_dict(input_data_all, i))) File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\image2text.py", line 78, in get_value result = model.answer_question(img, query) File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 1223, in answer_question output = self.text_model.answer_question(self.cached_vision_encoder( image), question) File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 1145, in answer_question answer = self.generate( File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 1137, in generate output_ids = self.model.generate( File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, *kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\transformers\generation\utils.py", line 1544, in generate return self.greedy_search( File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\transformers\generation\utils.py", line 2404, in greedy_search outputs = self( File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 1033, in forward hidden_states = self.transformer( File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 992, in forward hidden_states = layer( File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, *kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 811, in forward attn_outputs = self.mixer( File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(args, kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(*args, kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 765, in forward attn_output = self._forward_cross_attn( File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 741, in _forward_cross_attn return self.inner_cross_attn( File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl return forward_call(args, kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(*args, *kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\myenv\lib\site-packages\torch\amp\autocast_mode.py", line 16, in decorate_autocast return func(args, **kwargs) File "C:\ComfyUI_windows_portable\ComfyUI\custom_nodes\Comfyui_image2prompt\moondream_model.py", line 539, in forward padding_mask.maskedfill(key_padding_mask, 0.0)

fivecanal5 commented 7 months ago

I tried to forcibly resize the key_padding_mask before using it to fill the padding_mask, and it runs with no errors, but the output repeats the last token over and over again until it fills the entire length it seems. The same issue is also occurring for the qwen model.

This is what I did to resize the mask:

if key_padding_mask.shape[1] != seqlen_k:
    if key_padding_mask.shape[1] > seqlen_k:
        key_padding_mask = key_padding_mask[:, :seqlen_k]
    else:
        padding_needed = seqlen_k - key_padding_mask.shape[1]
        padding = torch.full((batch_size, padding_needed), True, dtype=torch.bool, device=key_padding_mask.device)
        key_padding_mask = torch.cat([key_padding_mask, padding], dim=1)
fivecanal5 commented 7 months ago

This appears to be a problem with moondream itself: https://github.com/vikhyat/moondream/issues/50#issuecomment-1968783599

Downgrading transformers to 4.36.2 fixes both the error and the problem of repeating last token, but it makes the qwen model unusable.

Pauweltje commented 7 months ago

Hi Thanks,

What model does give in your opinion the best results? Moondream or qwen?

I use qwen now, but sometimes the first queue starts of with an error or it has repeating errors.

all the best, P.

fivecanal5 commented 7 months ago

Depends on the use case. I mostly just use it to list the objects in an image, and moondream is quite good at this and fairly efficient. The qwen model feels to me better at detailed description, but obviously you need to deal with the repeating last token issue.

yotraxxx commented 6 months ago

Downgrading transformers to 4.36.2 fixes both the error and the problem of repeating last token, but it makes the qwen model unusable.

Newbie to handle those kind of case here: How do you manage to downgrade transformers ?

SoaringTiger commented 6 months ago

Downgrading transformers to 4.36.2 fixes both the error and the problem of repeating last token, but it makes the qwen model unusable.

Newbie to handle those kind of case here: How do you manage to downgrade transformers ?

pip install transformers==4.36.2

zhongpei commented 6 months ago

pip install transformers==4.37.1