roytseng-tw / Detectron.pytorch

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.
MIT License
2.82k stars 567 forks source link

RuntimeWarning: Negative areas founds: 3 warnings.warn("Negative areas founds: %d" % neg_area_idx.size, RuntimeWarning)) #172

Open PkuRainBow opened 5 years ago

PkuRainBow commented 5 years ago

I add a non-local block to the backbone and train the network following "e2e_mask_rcnn_R-50-RPN_1x.yaml".

However, I am not sure how to set the learning rate of the extra introduced non-local block. I just do not change the weight initializer and the learning rate setting related code.

Here I provide the change that I make,

First, I insert a non_local module within the function add_stage_w_non_local,


def add_stage_w_non_local(inplanes, outplanes, innerplanes, nblocks, dilation=1, stride_init=2, choice=1):
    """Make a stage consist of `nblocks` residual blocks.
    Returns:
        - stage module: an nn.Sequentail module of residual blocks
        - final output dimension
    """
    res_blocks = []
    stride = stride_init
    for _block_id in range(nblocks):
        res_blocks.append(add_residual_block(
            inplanes, outplanes, innerplanes, dilation, stride
        ))
        inplanes = outplanes
        stride = 1

    if _block_id == (nblocks-1):
        if choice == 1:
           context_block = BaseOC_Module(in_channels=outplanes, out_channels=outplanes, key_channels=outplanes//2,
                value_channels=outplanes//2, dropout=0, sizes=([1]))
        elif choice == 2:
            context_block = GN_BaseOC_Module(in_channels=outplanes, out_channels=outplanes, key_channels=outplanes//2,
                value_channels=outplanes//2, dropout=0, sizes=([1]))
        elif choice == 3:
            context_block = BaseOC_Context_Module(in_channels=outplanes, out_channels=outplanes, key_channels=outplanes//2,
                value_channels=outplanes//2, dropout=0, sizes=([1]))
        elif choice == 4:
            context_block = GN_BaseOC_Context_Module(in_channels=outplanes, out_channels=outplanes, key_channels=outplanes//2,
                value_channels=outplanes//2, dropout=0, sizes=([1]))

    return nn.Sequential(*res_blocks, context_block), outplanes

Then, I modify the backbone definition like below by replacing the add_stage with add_stage_w_non_local when we compute the self.res4.

class NL_ResNet_convX_body(nn.Module):
    def __init__(self, block_counts, choice=1):
        super().__init__()
        self.block_counts = block_counts
        self.convX = len(block_counts) + 1
        self.num_layers = (sum(block_counts) + 3 * (self.convX == 4)) * 3 + 2

        self.res1 = globals()[cfg.RESNETS.STEM_FUNC]()
        dim_in = 64
        dim_bottleneck = cfg.RESNETS.NUM_GROUPS * cfg.RESNETS.WIDTH_PER_GROUP
        self.res2, dim_in = add_stage(dim_in, 256, dim_bottleneck, block_counts[0],
                                      dilation=1, stride_init=1)
        self.res3, dim_in = add_stage(dim_in, 512, dim_bottleneck * 2, block_counts[1],
                                      dilation=1, stride_init=2)
        self.res4, dim_in = add_stage_w_non_local(dim_in, 1024, dim_bottleneck * 4, block_counts[2],
                                      dilation=1, stride_init=2, choice=choice)
        if len(block_counts) == 4:
            stride_init = 2 if cfg.RESNETS.RES5_DILATION == 1 else 1
            self.res5, dim_in = add_stage(dim_in, 2048, dim_bottleneck * 8, block_counts[3],
                                          cfg.RESNETS.RES5_DILATION, stride_init)
            self.spatial_scale = 1 / 32 * cfg.RESNETS.RES5_DILATION
        else:
            self.spatial_scale = 1 / 16  # final feature scale wrt. original image scale

        self.dim_out = dim_in

        self._init_modules()

Expected results

[][e2e_nl_mask_rcnn_R-50-FPN_1x.yaml][Step 37061 / 90000]
        loss: 0.046611, lr: 0.020000 time: 2.231452, eta: 1 day, 8:48:53
        accuracy_cls: 0.996273
        loss_cls: 0.009185, loss_bbox: 0.005711, loss_mask: 0.029372
        loss_rpn_cls: 0.000334, loss_rpn_bbox: 0.000816
        loss_rpn_cls_fpn2: 0.000087, loss_rpn_cls_fpn3: 0.000020, loss_rpn_cls_fpn4: 0.000014, loss_rpn_cls_fpn5: 0.000018, loss_rpn_cls_fpn6: 0.000026
        loss_rpn_bbox_fpn2: 0.000322, loss_rpn_bbox_fpn3: 0.000078, loss_rpn_bbox_fpn4: 0.000065, loss_rpn_bbox_fpn5: 0.000059, loss_rpn_bbox_fpn6: 0.000192
/teamscratch/msravcshare/yuyua/code/segmentation/PANet/lib/utils/boxes.py:66: RuntimeWarning: Negative areas founds: 3
  warnings.warn("Negative areas founds: %d" % neg_area_idx.size, RuntimeWarning)
/teamscratch/msravcshare/yuyua/code/segmentation/PANet/lib/modeling/generate_proposals.py:181: RuntimeWarning: invalid value encountered in greater_equal
  (x_ctr < im_info[1]) & (y_ctr < im_info[0]))[0]
/teamscratch/msravcshare/yuyua/code/segmentation/PANet/lib/modeling/generate_proposals.py:181: RuntimeWarning: invalid value encountered in less
  (x_ctr < im_info[1]) & (y_ctr < im_info[0]))[0]
[][e2e_nl_mask_rcnn_R-50-FPN_1x.yaml][Step 37081 / 90000]
        loss: nan, lr: 0.020000 time: 2.231213, eta: 1 day, 8:47:55
        accuracy_cls: 0.941802
        loss_cls: nan, loss_bbox: nan, loss_mask: nan
        loss_rpn_cls: nan, loss_rpn_bbox: nan
        loss_rpn_cls_fpn2: nan, loss_rpn_cls_fpn3: nan, loss_rpn_cls_fpn4: nan, loss_rpn_cls_fpn5: nan, loss_rpn_cls_fpn6: nan
        loss_rpn_bbox_fpn2: nan, loss_rpn_bbox_fpn3: nan, loss_rpn_bbox_fpn4: nan, loss_rpn_bbox_fpn5: nan, loss_rpn_bbox_fpn6: nan
/root/miniconda3/lib/python3.6/site-packages/numpy/lib/function_base.py:4033: RuntimeWarning: Invalid value encountered in median
  r = func(a, **kwargs)
[][e2e_nl_mask_rcnn_R-50-FPN_1x.yaml][Step 37101 / 90000]
        loss: nan, lr: 0.020000 time: 2.230715, eta: 1 day, 8:46:44
        accuracy_cls: 0.000000
        loss_cls: nan, loss_bbox: nan, loss_mask: nan
        loss_rpn_cls: nan, loss_rpn_bbox: nan
        loss_rpn_cls_fpn2: nan, loss_rpn_cls_fpn3: nan, loss_rpn_cls_fpn4: nan, loss_rpn_cls_fpn5: nan, loss_rpn_cls_fpn6: nan
        loss_rpn_bbox_fpn2: nan, loss_rpn_bbox_fpn3: nan, loss_rpn_bbox_fpn4: nan, loss_rpn_bbox_fpn5: nan, loss_rpn_bbox_fpn6: nan

System information

PkuRainBow commented 5 years ago

I am trying the solution by Ross. Solution

I will report the results latter.

Redaimao commented 5 years ago

@PkuRainBow hello, have you fixed the problem? btw, which context model are you using? Thank you.

PkuRainBow commented 5 years ago

@Redaimao I have solved this problem. I use the Res50-FPN as the backbone.

Redaimao commented 5 years ago

@PkuRainBow Would you mind give details on how to implement it? and also please share some details on context module. Thanks a lot!

RainHxj commented 5 years ago

@PkuRainBow hello, do you have implemented the performance of maskrnn with non local. (increase about 1 point )