Closed KombangkoeDias closed 2 years ago
Hi,
You can take a look at this notebook: https://github.com/NielsRogge/Transformers-Tutorials/blob/master/SegFormer/Fine_tune_SegFormer_on_custom_dataset.ipynb
It shows how to fine-tune SegFormer on a custom dataset (BeiT is equivalent), and it also outputs logits of shape (batch_size, num_labels, height/4, width/4). One interpolates these to the original size of the image.
Note that we are in the process of determining generic outputs for semantic segmentation models, so it might be that in the future, the logits will automatically have the same size as the original pixel_values
.
Thank you.
From the documentation, it says that the logits shape will be (batch_size, num_labels, height/4, width/4) I assume that the logits are the output masks of the model (since I'm doing the segmentation). How do I convert this shape (height /4, width /4) to the original image's shape before being resized to (height, width)?
-> # logits are of shape (batch_size, num_labels, height/4, width/4) I realized that the input image is resized to the shape (height, width) by BeiTFeatureExtractor object while the height and widthare those defined in the BeitFeatureExtractor's config and are constant values. This means the output values' shapes are not the original image's shape but are rather resized-shape/4