Closed 17sarf closed 5 years ago
See section 2 - Modifying the model to add a different backbone
of https://colab.research.google.com/github/pytorch/vision/blob/temp-tutorial/tutorials/torchvision_finetuning_instance_segmentation.ipynb , where we create a new anchor generator.
In this particular case, you'll need to get a different backbone for your model, which currently expects a single layer, but you are instead giving it 5 layers from the FPN.
Basically, you'll want to replace those lines https://github.com/pytorch/vision/blob/e4d5003956db97d4e4bc1055ec8b045c39ee4882/torchvision/models/detection/backbone_utils.py#L52-L60 with something like
return_layers = {'layer4': 3}
in_channels_stage2 = backbone.inplanes // 8
in_channels_list = [
in_channels_stage2 * 8,
]
Given that this is not a bug with torchvision, I'm closing this issue but feel free to re-open it if you have further questions
Sorry, I am a bit confused. If I use the 2 - Modifying the model to add a different backbone
approach, I would not need to adjust
in_channels_stage2 = backbone.inplanes // 8
in_channels_list = [
in_channels_stage2,
in_channels_stage2 * 2,
in_channels_stage2 * 4,
in_channels_stage2 * 8,
]
Since I am not using the Method 1 - fasterrcnn_resnet50_fpn
approach? Is that correct or am I mistaken?
What happens is that you are trying to pass a model which returns features for many layers, but the AnchorGenerator is expecting a single layer. This needs to be changed.
Thank you for your reply. I see what you mean, I misread/misunderstood your reply initially. I have made the changes you suggested and I get the following error:
167 pred_boxes = self.decode_single(
--> 168 rel_codes.reshape(sum(boxes_per_image), -1), concat_boxes
169 )
170 return pred_boxes.reshape(sum(boxes_per_image), -1, 4)
RuntimeError: shape '[22500, -1]' is invalid for input of size 114336
Does that mean I would need to change the size of the output of layer4
to match the 114336 size?
You'll need to change the number of input channels to the RPN as well.
Thank you again. I have made the amendments. However, I get the following error:
RuntimeError: Error(s) in loading state_dict for FasterRCNN:
Unexpected key(s) in state_dict: "backbone.fpn.inner_blocks.1.weight", "backbone.fpn.inner_blocks.1.bias", "backbone.fpn.inner_blocks.2.weight", "backbone.fpn.inner_blocks.2.bias", "backbone.fpn.inner_blocks.3.weight", "backbone.fpn.inner_blocks.3.bias", "backbone.fpn.layer_blocks.1.weight", "backbone.fpn.layer_blocks.1.bias", "backbone.fpn.layer_blocks.2.weight", "backbone.fpn.layer_blocks.2.bias", "backbone.fpn.layer_blocks.3.weight", "backbone.fpn.layer_blocks.3.bias".
size mismatch for backbone.fpn.inner_blocks.0.weight: copying a param with shape torch.Size([256, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([256, 2048, 1, 1]).
It seems to work if I don't implement this change:
return_layers = {'layer4': 3}
in_channels_stage2 = backbone.inplanes // 8
in_channels_list = [
in_channels_stage2 * 8,
]
This is the method I am using to make the changes:
model = fasterrcnn_resnet50_fpn(pretrained=True)
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
aspect_ratios=((0.5, 1.0, 2.0),))
model.rpn.anchor_generator = anchor_generator
model.rpn.head = RPNHead(256, anchor_generator.num_anchors_per_location()[0])
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes=5)
However, when I set:
anchor_generator = AnchorGenerator(sizes=((128, 256, 512),),
aspect_ratios=((1.0, 2.0),))
It attempts to run and then after some time it will output this error: Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
Sorry, I am also a bit confused as to what you mean by
You'll need to change the number of input channels to the RPN as well.
Are you referring to this? If so, how would I change the input channels that you are referring to?
model.rpn.head = RPNHead(256, anchor_generator.num_anchors_per_location()[0])
You can't use the all the pre-trained weights if you change the model architecture.
Are you referring to this? If so, how would I change the input channels that you are referring to?
From the torchvision colab tutorial I pointed just above, you can do something like
You can build and pass the rpn
to the constructor of FasterRCNN
, see
https://github.com/pytorch/vision/blob/09823951fec09215f9efc5d5d31456763da5ae04/torchvision/models/detection/faster_rcnn.py#L187-L189
It attempts to run and then after some time it will output this error: Process finished with exit code 137 (interrupted by signal 9: SIGKILL)
This error is weird, and might indicate that the model you create was too big for your machine and when trying to run it the system killed the process.
@fmassa
I came across the similar situation.
keeping the rest intact, , I tried different anchor_sizes:
anchor_sizes = ((8,), (12,), (16,))
----error
anchor_sizes = ((8,), (12,), (16,), (24,))
----error
anchor_sizes = ((8,), (12,), (16,), (24,), (32,))
----ok
anchor_sizes = ((8,), (12,), (16,), (24,), (32,),(64,))
----ok
It seems like only when anchor_sizes is smaller than 5, it spills the alike following error:
rel_codes.reshape(sum(boxes_per_image), -1), concat_boxes
RuntimeError: shape '[508032, -1]' is invalid for input of size 2062368``
By the way, i use mask rcnn:
anchor_sizes = ((8,), (12,), (16,), (24,), (32,))
aspect_ratios = ((0.5, 1.0, 2.0),) * len(anchor_sizes)
rpn_anchor_generator = AnchorGenerator(
anchor_sizes, aspect_ratios
)
model = mask_rcnn_model.maskrcnn_resnet50_fpn(rpn_anchor_generator=rpn_anchor_generator, pretrained=False)
in_features = model.roi_heads.box_predictor.cls_score.in_features
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)
in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
hidden_layer = 256
model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
hidden_layer,
num_classes)
@feipeng-wang the problem you are facing is that the number of anchors you have (specified by the rpn_anchor_generator
) is not compatible with the number of predictions from the RPN in model.rpn.head
.
If you reduce the number of anchors, you also need to reduce the number of features that are passed to the RPN
heads, by changing the number of feature maps from the backbone that you pass.
But I'm thinking that I should probably improve the error message to be more explicit about it.
After spent a few hours, I finally got what you mean: Initially, I was confused by the connection between feature maps and anchor sizes. Finally, I found out the 'why':
def grid_anchors(self, grid_sizes, strides):
anchors = []
for size, stride, base_anchors in zip(
grid_sizes, strides, self.cell_anchors
):
where len(grid_sizes), len(strides) equals the number of feature maps, while len(cell_anchors) equals the number of base achor-size. However, I still don't understand why it has to be writen this way. Thanks.
@feipeng-wang if you have a better way of representing this, I'd love to hear your thoughts on how to simplify / improve things.
But I think a good first step is to have a better error message, which I'm tracking in #1539
After spent a few hours, I finally got what you mean: Initially, I was confused by the connection between feature maps and anchor sizes. Finally, I found out the 'why':
def grid_anchors(self, grid_sizes, strides): anchors = [] for size, stride, base_anchors in zip( grid_sizes, strides, self.cell_anchors ):
where len(grid_sizes), len(strides) equals the number of feature maps, while len(cell_anchors) equals the number of base achor-size. However, I still don't understand why it has to be writen this way. Thanks.
@feipeng-wang Sorry to bother you, did you solve this problem? I have the same question with this part.
As the comment in rpn.py
mentioned:
https://github.com/pytorch/vision/blob/505cd6957711af790211896d32b40291bea1bc21/torchvision/models/detection/rpn.py#L116
So more specifically I think the right logic here is something like below:
for size, stride in zip(grid_sizes, strides):
for base_anchors in self.cell_anchors:
pass
Any ideas? Thanks!
@
After spent a few hours, I finally got what you mean: Initially, I was confused by the connection between feature maps and anchor sizes. Finally, I found out the 'why': def grid_anchors(self, grid_sizes, strides): anchors = [] for size, stride, base_anchors in zip( grid_sizes, strides, self.cell_anchors
):where len(grid_sizes), len(strides) equals the number of feature maps, while len(cell_anchors) equals the number of base achor-size. However, I still don't understand why it has to be writen this way. Thanks.
@feipeng-wang Sorry to bother you, did you solve this problem? I have the same question with this part. As the comment in rpn.py mentioned:
vision/torchvision/models/detection/rpn.py Line 116 in 505cd69 # For every combination of (a, (g, s), i) in (self.cell_anchors, zip(grid_sizes, strides), 0:2),
So more specifically I think the right logic here is something like below: for size, stride in zip(grid_sizes, strides): for base_anchors in self.cell_anchors: pass Any ideas? Thanks!
It‘s been a long time. Seems the owner did make an update. So see if I could recall. How do you set your anchor sizes?
Not sure if this helps, but this is how I set my anchors/aspect ratios:
anchor_generator = AnchorGenerator(sizes=tuple([(32, 64, 128) for _ in range(5)]), aspect_ratios=tuple([(1.0, 2.0) for _ in range(5)]))
This method worked for me using a particular dataset, but not another one. Both datasets were for pedestrian detection.
@feipeng-wang @17sarf sorry for late reply, thank you so much. I solved my problem by taking a look at the paper Feature Pyramid Networks for Object Detection :
4.1. Feature Pyramid Networks for RPN ... Because the head slides densely over all locations in all pyramid levels, it is not necessary to have multi-scale anchors on a specific level. Instead, we assign anchors of a single scale to each level.
So 5 types of shapes is required if we want to use the default backbone.
@soloist97 sorry, do you mean I should be using something like this:
anchor_generator = AnchorGenerator(sizes=((8), (16), (32), (64), (128))), aspect_ratios=((1.0, 2.0,)))
instead of this format:
anchor_generator = AnchorGenerator(sizes=tuple([(32, 64, 128) for _ in range(5)]), aspect_ratios=tuple([(1.0, 2.0) for _ in range(5)]))
@17sarf Yes, I guess.
As long as we guarantee len(sizes) = 5
(5 tuples in sizes
), it's ok to use any size and ratio settings. But according to the FPN paper, the first way is the better choice.
So maybe
anchor_generator = AnchorGenerator(sizes=((8), (16), (32), (64), (128))), aspect_ratios=((1.0, 2.0,)))
can do faster in computation without performance loss?
Thank @soloist97! I will play around with it and will let you know if I discover anything.
@fmassa I just realised I never thanked you for all of your help earlier. Please accept my apologies for this and belated thank you.
I would like to you the to use the following with
fasterrcnn_resnet50_fpn
:But I get this error while attempting to train the model:
This is my implementation:
These anchors scales and and aspect ratios are based on this paper: https://arxiv.org/abs/1506.01497
Any advice would be greatly appreciated.