kijai / ComfyUI-Florence2

Inference Microsoft Florence2 VLM
MIT License
300 stars 17 forks source link

torch.cat(): expected a non-empty list of Tensors #3

Closed LankyPoet closed 1 week ago

LankyPoet commented 1 week ago

Hi, thanks for making this! I am running Windows 11, latest comfyui, python 3.11.9, and have flash-attn successfully installed.

When I select "caption" or "detailed caption" etc. with any of the model options, if I leave the text input blank, I receive the below error. If I enter text, I receive a different error (pasted that below the first error).


Error occurred when executing Florence2Run:

torch.cat(): expected a non-empty list of Tensors

  File "D:\ComfyUI\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\custom_nodes\ComfyUI-Florence2\nodes.py", line 256, in encode
    out_tensor = torch.cat(out, dim=0)
                 ^^^^^^^^^^^^^^^^^^^^^

2nd error:

!!! Exception during processing!!! Task token <CAPTION> should be the only token in the text. Traceback (most recent call last): File "D:\ComfyUI\execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI\execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI\execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "D:\ComfyUI\custom_nodes\ComfyUI-Florence2\nodes.py", line 135, in encode inputs = processor(text=prompt, images=image_pil, return_tensors="pt", do_rescale=False).to(device) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Default.LivingRoomPC\.cache\huggingface\modules\transformers_modules\Florence-2-base\processing_florence2.py", line 266, in __call__ text = self._construct_prompts(text) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Default.LivingRoomPC\.cache\huggingface\modules\transformers_modules\Florence-2-base\processing_florence2.py", line 145, in _construct_prompts assert _text == task_token, f"Task token {task_token} should be the only token in the text." ^^^^^^^^^^^^^^^^^^^ AssertionError: Task token <CAPTION> should be the only token in the text.

kijai commented 1 week ago

I thought I fixed this, does this happen with the latest version?

LankyPoet commented 1 week ago

You are correct! Looks good now, sorry for the false alarm I didn't realize I quickly fell a version behind. Thank you!

When I add text instructions to the caption, I now get this, which I assume is correct behavior though I didn't know we couldn't guide it further like other VLMs:

Error occurred when executing Florence2Run:

Text input (prompt) is only supported for 'referring_expression_segmentation' and 'caption_to_phrase_grounding'

  File "D:\ComfyUI\execution.py", line 151, in recursive_execute
    output_data, output_ui = get_output_data(obj, input_data_all)
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\execution.py", line 81, in get_output_data
    return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\execution.py", line 74, in map_node_over_list
    results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI\custom_nodes\ComfyUI-Florence2\nodes.py", line 160, in encode
    raise ValueError("Text input (prompt) is only supported for 'referring_expression_segmentation' and 'caption_to_phrase_grounding'")
kijai commented 1 week ago

It's brand new so expect lots of updates, I think I'm stopping for today though, made many changes!

That error says it all, only those options can use the input. I'm not sure how to make it clearer, without making separate nodes that is, which I'd rather not do as it's just more upkeep.

LankyPoet commented 1 week ago

I completely understand. The error message is good and makes sense. I just personally didn't know that Florence wouldn't take in some "system prompting" to guide output like other VLMs I've used. Take a break! Thank you again for making this, nice work.