When you are matching pixel-text score loss, you are using cross entropy between score map S and ground truth segmentation label Y.
However, shape of S (H4 * W4, K) and Y (H,W,1) cannot be matched, because S is gained from feature map from encoder and have smaller H and W.
In the code, this is done through identity head
@HEADS.register_module()
class IdentityHead(BaseDecodeHead):
"""Panoptic Feature Pyramid Networks.
This head is the implementation of `Semantic FPN
<https://arxiv.org/abs/1901.02446>`_.
Args:
feature_strides (tuple[int]): The strides for input feature maps.
stack_lateral. All strides suppose to be power of 2. The first
one is of largest resolution.
"""
def __init__(self, **kwargs):
super(IdentityHead, self).__init__(
input_transform=None, **kwargs)
self.conv_seg = None
def forward(self, inputs):
return inputs
But I do not see clear connection how to match size between shape of S and Y.
Did you simply downsample ground truth map Y to match S?
hello, Thank you for sharing your great work!
When you are matching pixel-text score loss, you are using cross entropy between score map S and ground truth segmentation label Y.
However, shape of S (H4 * W4, K) and Y (H,W,1) cannot be matched, because S is gained from feature map from encoder and have smaller H and W.
In the code, this is done through identity head
But I do not see clear connection how to match size between shape of S and Y.
Did you simply downsample ground truth map Y to match S?
Thank you!