kijai / ComfyUI-Florence2

Inference Microsoft Florence2 VLM
MIT License
608 stars 39 forks source link

referring_expression_segmentation and multiple segments #63

Open andrewn3 opened 3 weeks ago

andrewn3 commented 3 weeks ago

If I use "caption_to_phrase_grounding" using multiple inputs on the text prompt eg. bike, red car, it highlights the segments in the image and I can get the separate output_mask_select to work. Sometimes you need to be specific e.g. car won't select the car but red car will do.

However, if I use the same text input for "referring_expression_segementation", I can't get it to work to identify multiple segments, it only selects and highlights one of these. I've tried all different forms of prompts (e.g. bike(and)red car) or locate bike, red car in the image with mask). It works at separately highlighting segments one item at a time.

Think this is a bug.

Lilien86 commented 3 hours ago

I have the same issue

Text input (prompt) is only supported for 'referring_expression_segmentation', 'caption_to_phrase_grounding', and 'docvqa'
Lilien86 commented 3 hours ago

I found, int his video there is the answer (https://www.youtube.com/watch?v=BRST8-yPD5A) at the 5 minute mark