RoiAlign Cause Feature Value to Explode During Training

I'm using the RoiAlign from your xform forlder. But it seems like this layer cause my feature's values to explode to extremely large numbers after training for a while.

Hence, I Implemented a simple CropAlign Operation based on F.grid_sample in Pytorch. And with a simple replacement of this one line feature_rois = roialignOp(feature_var, rois_var) in my code, it works fine and never explode again.

So this confuse me a lot. Have anyone meet the same problem with me ?

Follows are my basic CropAlign. Which I have checked the forward is comparable with RoiAlign in this repo. And since it is based on F.grid_sample in Pytorch, the backward is no need to worry. I put a comparison demo of these two Align layer in here.

Since the whole training code is complex, and also because this problem can be solve with a single line replacement, so I just put the core code here, to demonstrate the problem.

def AffineAlignOp(features, idxs, aligned_height, aligned_width, Hs):
    def _transform_matrix(Hs, w, h):
        _Hs = np.zeros(Hs.shape, dtype = np.float32)
        for i, H in enumerate(Hs):
            H0 = np.concatenate((H, np.array([[0, 0, 1]])), axis=0)
            A = np.array([[2.0 / w, 0, -1], [0, 2.0 / h, -1], [0, 0, 1]])
            A_inv = np.array([[w / 2.0, 0, w / 2.0], [0, h / 2.0, h/ 2.0], [0, 0, 1]])
            H0 = A.dot(H0).dot(A_inv)
            H0 = np.linalg.inv(H0)
            _Hs[i] = H0[:-1]
        return _Hs
    bz, C_feat, H_feat, W_feat = features.size()
    N = len(idxs)
    feature_select = features[idxs] # (N, feature_channel, feature_size, feature_size)
    Hs_new = _transform_matrix(Hs, w=W_feat, h=H_feat) # return (N, 2, 3)
    Hs_var = Variable(torch.from_numpy(Hs_new), requires_grad=False).cuda()
    flow = F.affine_grid(theta=Hs_var, size=(N, C_feat, H_feat, W_feat)).float().cuda()
    flow = flow[:,:aligned_height, :aligned_width, :]
    rois = F.grid_sample(feature_select, flow, mode='bilinear', padding_mode='border') # 'zeros' | 'border' 
    return rois

def CropAlignOp(feature_var, rois_var, aligned_height, aligned_width, spatial_scale):
    rois_np = rois_var.data.cpu().numpy()
    idxs = rois_np[:,0]
    affinematrixs_feat = []
    for roi in rois_np:
        x1, y1, x2, y2 = roi[1:] * float(spatial_scale)
        matrix = np.array([[aligned_width/(x2-x1), 0, -aligned_width/(x2-x1)*x1],
                           [0, aligned_height/(y2-y1), -aligned_height/(y2-y1)*y1]
                          ])
        affinematrixs_feat.append(matrix)
    affinematrixs_feat = np.array(affinematrixs_feat)
    feature_rois = AffineAlignOp(feature_var, idxs, align_size, align_size, affinematrixs_feat)
    return feature_rois

input_res = 512
feature_res = 128
align_size = 64

roialignOp = RoIAlign(aligned_height=align_size, 
                       aligned_width=align_size, 
                       spatial_scale=float(feature_res)/input_res, 
                       sampling_ratio=0)
# Usage:
# feature_rois = roialignOp(feature_var, rois_var)

# Usage:
# feature_rois = CropAlignOp(feature_var, rois_var, 
#                            aligned_height=align_size, 
#                            aligned_width=align_size, 
#                            spatial_scale=float(feature_res)/input_res)

System information

Operating system: Linux 14.04
CUDA version: 8.0
cuDNN version: 6
GPU models (for all devices if they are not all the same): TITAN X (Pascal)
python version: 3.5
pytorch version: 0.4
Anything else that seems relevant: None

roytseng-tw / Detectron.pytorch

RoiAlign Cause Feature Value to Explode During Training #70

System information