how can calculate the predict score of every pixel use mask2former swin-l model?

funny000 commented 1 week ago

Feature request

I has download the mask2former swin-l model from huggingface website, and use example code get segmentation map of image, the example code is:

import requests
import torch
from PIL import Image
from transformers import AutoImageProcessor, Mask2FormerForUniversalSegmentation

# load Mask2Former fine-tuned on COCO panoptic segmentation
processor = AutoImageProcessor.from_pretrained("facebook/mask2former-swin-large-coco-panoptic")
model = Mask2FormerForUniversalSegmentation.from_pretrained("facebook/mask2former-swin-large-coco-panoptic")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=image, return_tensors="pt")

with torch.no_grad():
    outputs = model(**inputs)

# model predicts class_queries_logits of shape `(batch_size, num_queries)`
# and masks_queries_logits of shape `(batch_size, num_queries, height, width)`
class_queries_logits = outputs.class_queries_logits
masks_queries_logits = outputs.masks_queries_logits

# you can pass them to processor for postprocessing
result = processor.post_process_panoptic_segmentation(outputs, target_sizes=[image.size[::-1]])[0]
# we refer to the demo notebooks for visualization (see "Resources" section in the Mask2Former docs)
predicted_panoptic_map = result["segmentation"]

the code can get the seg map, but not pred score of every pixel, so how to add calculate code to get perd score of every pixel?

Motivation

not motivation

Your contribution

a littlecontribution

qubvel commented 1 week ago

Hi @funny000,

As far as I see the pred probability is not returned for each pixel, but you can have a look at postprocessing code, here is probably what you need

https://github.com/huggingface/transformers/blob/65bb28444849976f853063edb958b3ef3dd59d12/src/transformers/models/mask2former/image_processing_mask2former.py#L1207

funny000 commented 1 week ago

thanks @qubvel, I see the image_processing_mask2former of code, and get the mask prob as like as pred score, has change code to output mask probs to post process of my work. but have a other problem, when i slider a big height width image to generate panotanic segmantation map, every small image of big image generate different label_id in same object, just like a car or water. it's so odd, Why model output different label_id of same object in every small image?

qubvel commented 1 week ago

Hi @funny000, that is expected behavior, the model does not re-identify objects between images. On each image object is a unique instance with its own class label, nothing else. You might want to search and explore some techniques on how to re-identify / stitch objects from different images or frames.

huggingface / transformers