kuangliu / pytorch-retinanet

RetinaNet in PyTorch
992 stars 250 forks source link

Issue with encoder.py #79

Open CluelessIT opened 3 years ago

CluelessIT commented 3 years ago

Would like to clarify the input for _get_anchor_boxes function under encoder.py is a tuple or a int?

As I realised after the code went to line 50 it is not able to assign the values to fm_w and fm_h.

def _get_anchor_boxes(self**, input_size):**
        '''Compute anchor boxes for each feature map.
        Args:
          input_size: (tensor) model input size of (w,h).
        Returns:
          boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
                        where #anchors = fmw * fmh * #anchors_per_cell
        '''
        num_fms = len(self.anchor_areas)
        **fm_sizes = [(input_size/pow(2.,i+3)).ceil() for i in range(num_fms)]  # p3 -> p7 feature map sizes**

        boxes = []
        for i in range(num_fms):
            **fm_size = fm_sizes[i]**
            grid_size = input_size / fm_size
            **fm_w, fm_h = int(fm_size[0]), int(fm_size[1])**
            xy = meshgrid(fm_w,fm_h) + 0.5  # [fm_h*fm_w, 2]
            xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2)
            wh = self.anchor_wh[i].view(1,1,9,2).expand(fm_h,fm_w,9,2)
            box = torch.cat([xy,wh], 3)  # [x,y,w,h]
            boxes.append(box.view(-1,4))
        return torch.cat(boxes, 0)

I assign input_size as a integer for example 448, and the output of fm_size is a list of elements. Not a list of tuples. So I am confused as to what should be the values inside fm_size. And in general the purpose of doing this encoder.py

If anybody is able to explain to me the purpose of it that would be great! Thank you so much!

CluelessIT commented 3 years ago

Would like to clarify the input for _get_anchor_boxes function under encoder.py is a tuple or a int?

As I realised after the code went to line 50 it is not able to assign the values to fm_w and fm_h.

def _get_anchor_boxes(self**, input_size):**
        '''Compute anchor boxes for each feature map.
        Args:
          input_size: (tensor) model input size of (w,h).
        Returns:
          boxes: (list) anchor boxes for each feature map. Each of size [#anchors,4],
                        where #anchors = fmw * fmh * #anchors_per_cell
        '''
        num_fms = len(self.anchor_areas)
        **fm_sizes = [(input_size/pow(2.,i+3)).ceil() for i in range(num_fms)]  # p3 -> p7 feature map sizes**

        boxes = []
        for i in range(num_fms):
            **fm_size = fm_sizes[i]**
            grid_size = input_size / fm_size
            **fm_w, fm_h = int(fm_size[0]), int(fm_size[1])**
            xy = meshgrid(fm_w,fm_h) + 0.5  # [fm_h*fm_w, 2]
            xy = (xy*grid_size).view(fm_h,fm_w,1,2).expand(fm_h,fm_w,9,2)
            wh = self.anchor_wh[i].view(1,1,9,2).expand(fm_h,fm_w,9,2)
            box = torch.cat([xy,wh], 3)  # [x,y,w,h]
            boxes.append(box.view(-1,4))
        return torch.cat(boxes, 0)

I assign input_size as a integer for example 448, and the output of fm_size is a list of elements. Not a list of tuples. So I am confused as to what should be the values inside fm_size. And in general the purpose of doing this encoder.py

If anybody is able to explain to me the purpose of it that would be great! Thank you so much!

@kuangliu