pytorch / vision

Datasets, Transforms and Models specific to Computer Vision
https://pytorch.org/vision
BSD 3-Clause "New" or "Revised" License
16.29k stars 6.96k forks source link

Change anchor sizes #1308

Closed 17sarf closed 5 years ago

17sarf commented 5 years ago

I would like to you the to use the following with fasterrcnn_resnet50_fpn:

AnchorGenerator(sizes=((128, 256, 512),),
                aspect_ratios=((2, 1.0, 0.5),))

But I get this error while attempting to train the model:

line 168, in decode
    rel_codes.reshape(sum(boxes_per_image), -1), concat_boxes
RuntimeError: shape '[1440000, -1]' is invalid for input of size 7674336

This is my implementation:

def fasterrcnn_resnet50_fpn(pretrained=False, progress=True,
                            num_classes=91, pretrained_backbone=True, **kwargs):
    if pretrained:
        pretrained_backbone = False
    backbone = resnet_fpn_backbone('resnet50', pretrained_backbone)
    anchor_generator = AnchorGenerator(sizes=((128, 256, 512),),
                                       aspect_ratios=((2, 1.0, 0.5),))
    model = FasterRCNN(backbone, num_classes, rpn_anchor_generator=anchor_generator, **kwargs)
    if pretrained:
        state_dict = load_state_dict_from_url(model_urls['fasterrcnn_resnet50_fpn_coco'],
                                              progress=progress)
        model.load_state_dict(state_dict)
    return model

def get_fasterrcnn_resnet50_fpn_model(num_classes, device):

    model = fasterrcnn_resnet50_fpn(pretrained=False)
   in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
    return model.to(device)

These anchors scales and and aspect ratios are based on this paper: https://arxiv.org/abs/1506.01497

Any advice would be greatly appreciated.

fmassa commented 5 years ago

See section 2 - Modifying the model to add a different backbone of https://colab.research.google.com/github/pytorch/vision/blob/temp-tutorial/tutorials/torchvision_finetuning_instance_segmentation.ipynb , where we create a new anchor generator.

In this particular case, you'll need to get a different backbone for your model, which currently expects a single layer, but you are instead giving it 5 layers from the FPN.

Basically, you'll want to replace those lines https://github.com/pytorch/vision/blob/e4d5003956db97d4e4bc1055ec8b045c39ee4882/torchvision/models/detection/backbone_utils.py#L52-L60 with something like

return_layers = {'layer4': 3}

in_channels_stage2 = backbone.inplanes // 8
in_channels_list = [
    in_channels_stage2 * 8,
]

Given that this is not a bug with torchvision, I'm closing this issue but feel free to re-open it if you have further questions

17sarf commented 5 years ago

Sorry, I am a bit confused. If I use the 2 - Modifying the model to add a different backbone approach, I would not need to adjust

 in_channels_stage2 = backbone.inplanes // 8 
 in_channels_list = [ 
     in_channels_stage2, 
     in_channels_stage2 * 2, 
     in_channels_stage2 * 4, 
     in_channels_stage2 * 8, 
 ] 

Since I am not using the Method 1 - fasterrcnn_resnet50_fpn approach? Is that correct or am I mistaken?

fmassa commented 5 years ago

What happens is that you are trying to pass a model which returns features for many layers, but the AnchorGenerator is expecting a single layer. This needs to be changed.

17sarf commented 5 years ago

Thank you for your reply. I see what you mean, I misread/misunderstood your reply initially. I have made the changes you suggested and I get the following error:

    167         pred_boxes = self.decode_single(
--> 168             rel_codes.reshape(sum(boxes_per_image), -1), concat_boxes
    169         )
    170         return pred_boxes.reshape(sum(boxes_per_image), -1, 4)

RuntimeError: shape '[22500, -1]' is invalid for input of size 114336

Does that mean I would need to change the size of the output of layer4 to match the 114336 size?

fmassa commented 5 years ago

You'll need to change the number of input channels to the RPN as well.

17sarf commented 5 years ago

Thank you again. I have made the amendments. However, I get the following error:

RuntimeError: Error(s) in loading state_dict for FasterRCNN:
    Unexpected key(s) in state_dict: "backbone.fpn.inner_blocks.1.weight", "backbone.fpn.inner_blocks.1.bias", "backbone.fpn.inner_blocks.2.weight", "backbone.fpn.inner_blocks.2.bias", "backbone.fpn.inner_blocks.3.weight", "backbone.fpn.inner_blocks.3.bias", "backbone.fpn.layer_blocks.1.weight", "backbone.fpn.layer_blocks.1.bias", "backbone.fpn.layer_blocks.2.weight", "backbone.fpn.layer_blocks.2.bias", "backbone.fpn.layer_blocks.3.weight", "backbone.fpn.layer_blocks.3.bias". 
    size mismatch for backbone.fpn.inner_blocks.0.weight: copying a param with shape torch.Size([256, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 2048, 1, 1]).

It seems to work if I don't implement this change:

return_layers = {'layer4': 3}

in_channels_stage2 = backbone.inplanes // 8
in_channels_list = [
    in_channels_stage2 * 8,
]

This is the method I am using to make the changes:

    model = fasterrcnn_resnet50_fpn(pretrained=True)
    anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                       aspect_ratios=((0.5, 1.0, 2.0),))
    model.rpn.anchor_generator = anchor_generator
    model.rpn.head = RPNHead(256, anchor_generator.num_anchors_per_location()[0])
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes=5)

However, when I set:

   anchor_generator = AnchorGenerator(sizes=((128, 256, 512),),
                                       aspect_ratios=((1.0, 2.0),))

It attempts to run and then after some time it will output this error: Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

17sarf commented 5 years ago

Sorry, I am also a bit confused as to what you mean by

You'll need to change the number of input channels to the RPN as well.

Are you referring to this? If so, how would I change the input channels that you are referring to?

model.rpn.head = RPNHead(256, anchor_generator.num_anchors_per_location()[0])

fmassa commented 5 years ago

You can't use the all the pre-trained weights if you change the model architecture.

Are you referring to this? If so, how would I change the input channels that you are referring to?

From the torchvision colab tutorial I pointed just above, you can do something like

You can build and pass the rpn to the constructor of FasterRCNN, see https://github.com/pytorch/vision/blob/09823951fec09215f9efc5d5d31456763da5ae04/torchvision/models/detection/faster_rcnn.py#L187-L189

It attempts to run and then after some time it will output this error: Process finished with exit code 137 (interrupted by signal 9: SIGKILL)

This error is weird, and might indicate that the model you create was too big for your machine and when trying to run it the system killed the process.

feipeng-wang commented 5 years ago

@fmassa I came across the similar situation.
keeping the rest intact, , I tried different anchor_sizes: anchor_sizes = ((8,), (12,), (16,)) ----error anchor_sizes = ((8,), (12,), (16,), (24,))----error anchor_sizes = ((8,), (12,), (16,), (24,), (32,))----ok anchor_sizes = ((8,), (12,), (16,), (24,), (32,),(64,))----ok It seems like only when anchor_sizes is smaller than 5, it spills the alike following error:

    rel_codes.reshape(sum(boxes_per_image), -1), concat_boxes
RuntimeError: shape '[508032, -1]' is invalid for input of size 2062368``

By the way, i use mask rcnn:

anchor_sizes = ((8,), (12,), (16,), (24,), (32,))

anchor_sizes = ((8,), (16,), (32,))

aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
rpn_anchor_generator = AnchorGenerator(
    anchor_sizes, aspect_ratios
)

model = mask_rcnn_model.maskrcnn_resnet50_fpn(rpn_anchor_generator=rpn_anchor_generator, pretrained=False)

in_features = model.roi_heads.box_predictor.cls_score.in_features

model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256

model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                   hidden_layer,
                                                   num_classes)
fmassa commented 5 years ago

@feipeng-wang the problem you are facing is that the number of anchors you have (specified by the rpn_anchor_generator) is not compatible with the number of predictions from the RPN in model.rpn.head. If you reduce the number of anchors, you also need to reduce the number of features that are passed to the RPN heads, by changing the number of feature maps from the backbone that you pass.

But I'm thinking that I should probably improve the error message to be more explicit about it.

feipeng-wang commented 5 years ago

After spent a few hours, I finally got what you mean: Initially, I was confused by the connection between feature maps and anchor sizes. Finally, I found out the 'why':

def grid_anchors(self, grid_sizes, strides):
        anchors = []
        for size, stride, base_anchors in zip(
            grid_sizes, strides, self.cell_anchors   
        ):

where len(grid_sizes), len(strides) equals the number of feature maps, while len(cell_anchors) equals the number of base achor-size. However, I still don't understand why it has to be writen this way. Thanks.

fmassa commented 5 years ago

@feipeng-wang if you have a better way of representing this, I'd love to hear your thoughts on how to simplify / improve things.

But I think a good first step is to have a better error message, which I'm tracking in #1539

soloist97 commented 4 years ago

After spent a few hours, I finally got what you mean: Initially, I was confused by the connection between feature maps and anchor sizes. Finally, I found out the 'why':

def grid_anchors(self, grid_sizes, strides):
        anchors = []
        for size, stride, base_anchors in zip(
            grid_sizes, strides, self.cell_anchors   
        ):

where len(grid_sizes), len(strides) equals the number of feature maps, while len(cell_anchors) equals the number of base achor-size. However, I still don't understand why it has to be writen this way. Thanks.

@feipeng-wang Sorry to bother you, did you solve this problem? I have the same question with this part.

As the comment in rpn.py mentioned: https://github.com/pytorch/vision/blob/505cd6957711af790211896d32b40291bea1bc21/torchvision/models/detection/rpn.py#L116

So more specifically I think the right logic here is something like below:

for size, stride in zip(grid_sizes, strides):
    for base_anchors in self.cell_anchors:
        pass

Any ideas? Thanks!

feipeng-wang commented 4 years ago

@

After spent a few hours, I finally got what you mean: Initially, I was confused by the connection between feature maps and anchor sizes. Finally, I found out the 'why': def grid_anchors(self, grid_sizes, strides): anchors = [] for size, stride, base_anchors in zip( grid_sizes, strides, self.cell_anchors
):

where len(grid_sizes), len(strides) equals the number of feature maps, while len(cell_anchors) equals the number of base achor-size. However, I still don't understand why it has to be writen this way. Thanks.

@feipeng-wang Sorry to bother you, did you solve this problem? I have the same question with this part. As the comment in rpn.py mentioned:

  vision/torchvision/models/detection/rpn.py

     Line 116
  in
  505cd69

       # For every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:2), 

So more specifically I think the right logic here is something like below: for size, stride in zip(grid_sizes, strides): for base_anchors in self.cell_anchors: pass Any ideas? Thanks!

It‘s been a long time. Seems the owner did make an update. So see if I could recall. How do you set your anchor sizes?

17sarf commented 4 years ago

Not sure if this helps, but this is how I set my anchors/aspect ratios:

anchor_generator = AnchorGenerator(sizes=tuple([(32, 64, 128) for _ in range(5)]), aspect_ratios=tuple([(1.0, 2.0) for _ in range(5)]))

This method worked for me using a particular dataset, but not another one. Both datasets were for pedestrian detection.

soloist97 commented 4 years ago

@feipeng-wang @17sarf sorry for late reply, thank you so much. I solved my problem by taking a look at the paper Feature Pyramid Networks for Object Detection :

4.1. Feature Pyramid Networks for RPN ... Because the head slides densely over all locations in all pyramid levels, it is not necessary to have multi-scale anchors on a specific level. Instead, we assign anchors of a single scale to each level.

So 5 types of shapes is required if we want to use the default backbone.

17sarf commented 4 years ago

@soloist97 sorry, do you mean I should be using something like this:

anchor_generator = AnchorGenerator(sizes=((8), (16), (32), (64), (128))), aspect_ratios=((1.0, 2.0,)))

instead of this format:

anchor_generator = AnchorGenerator(sizes=tuple([(32, 64, 128) for _ in range(5)]), aspect_ratios=tuple([(1.0, 2.0) for _ in range(5)]))

soloist97 commented 4 years ago

@17sarf Yes, I guess.

As long as we guarantee len(sizes) = 5(5 tuples in sizes), it's ok to use any size and ratio settings. But according to the FPN paper, the first way is the better choice.

So maybe anchor_generator = AnchorGenerator(sizes=((8), (16), (32), (64), (128))), aspect_ratios=((1.0, 2.0,))) can do faster in computation without performance loss?

17sarf commented 4 years ago

Thank @soloist97! I will play around with it and will let you know if I discover anything.

17sarf commented 4 years ago

@fmassa I just realised I never thanked you for all of your help earlier. Please accept my apologies for this and belated thank you.