roytseng-tw / Detectron.pytorch

A pytorch implementation of Detectron. Both training from scratch and inferring directly from pretrained Detectron weights are available.
MIT License
2.82k stars 544 forks source link

RoiAlign Cause Feature Value to Explode During Training #70

Open liruilong940607 opened 6 years ago

liruilong940607 commented 6 years ago

I'm using the RoiAlign from your xform forlder. But it seems like this layer cause my feature's values to explode to extremely large numbers after training for a while.

Hence, I Implemented a simple CropAlign Operation based on F.grid_sample in Pytorch. And with a simple replacement of this one line feature_rois = roialignOp(feature_var, rois_var) in my code, it works fine and never explode again.

So this confuse me a lot. Have anyone meet the same problem with me ?

Follows are my basic CropAlign. Which I have checked the forward is comparable with RoiAlign in this repo. And since it is based on F.grid_sample in Pytorch, the backward is no need to worry. I put a comparison demo of these two Align layer in here.

Since the whole training code is complex, and also because this problem can be solve with a single line replacement, so I just put the core code here, to demonstrate the problem.

def AffineAlignOp(features, idxs, aligned_height, aligned_width, Hs):
    def _transform_matrix(Hs, w, h):
        _Hs = np.zeros(Hs.shape, dtype = np.float32)
        for i, H in enumerate(Hs):
            H0 = np.concatenate((H, np.array([[0, 0, 1]])), axis=0)
            A = np.array([[2.0 / w, 0, -1], [0, 2.0 / h, -1], [0, 0, 1]])
            A_inv = np.array([[w / 2.0, 0, w / 2.0], [0, h / 2.0, h/ 2.0], [0, 0, 1]])
            H0 = A.dot(H0).dot(A_inv)
            H0 = np.linalg.inv(H0)
            _Hs[i] = H0[:-1]
        return _Hs
    bz, C_feat, H_feat, W_feat = features.size()
    N = len(idxs)
    feature_select = features[idxs] # (N, feature_channel, feature_size, feature_size)
    Hs_new = _transform_matrix(Hs, w=W_feat, h=H_feat) # return (N, 2, 3)
    Hs_var = Variable(torch.from_numpy(Hs_new), requires_grad=False).cuda()
    flow = F.affine_grid(theta=Hs_var, size=(N, C_feat, H_feat, W_feat)).float().cuda()
    flow = flow[:,:aligned_height, :aligned_width, :]
    rois = F.grid_sample(feature_select, flow, mode='bilinear', padding_mode='border') # 'zeros' | 'border' 
    return rois

def CropAlignOp(feature_var, rois_var, aligned_height, aligned_width, spatial_scale):
    rois_np = rois_var.data.cpu().numpy()
    idxs = rois_np[:,0]
    affinematrixs_feat = []
    for roi in rois_np:
        x1, y1, x2, y2 = roi[1:] * float(spatial_scale)
        matrix = np.array([[aligned_width/(x2-x1), 0, -aligned_width/(x2-x1)*x1],
                           [0, aligned_height/(y2-y1), -aligned_height/(y2-y1)*y1]
                          ])
        affinematrixs_feat.append(matrix)
    affinematrixs_feat = np.array(affinematrixs_feat)
    feature_rois = AffineAlignOp(feature_var, idxs, align_size, align_size, affinematrixs_feat)
    return feature_rois

input_res = 512
feature_res = 128
align_size = 64

roialignOp = RoIAlign(aligned_height=align_size, 
                       aligned_width=align_size, 
                       spatial_scale=float(feature_res)/input_res, 
                       sampling_ratio=0)
# Usage:
# feature_rois = roialignOp(feature_var, rois_var)

# Usage:
# feature_rois = CropAlignOp(feature_var, rois_var, 
#                            aligned_height=align_size, 
#                            aligned_width=align_size, 
#                            spatial_scale=float(feature_res)/input_res)

System information

roytseng-tw commented 6 years ago

Thank you for filing this issue and I'll go check it out. And is it possible for you to provide the training code(or any sample code) that will explode the feature values ?