yijingru / BBAVectors-Oriented-Object-Detection

[WACV2021] Oriented Object Detection in Aerial Images with Box Boundary-Aware Vectors
MIT License
462 stars 87 forks source link

Baseline method,map problem #63

Closed byfate closed 3 years ago

byfate commented 3 years ago

final: Map 77.8%

final loss. about range in [2.1,2.3] (run several times)

batch_size 12;two 2080 gpu other parameter as main.py

def generate_ground_truth_base
cen_x, cen_y, bbox_w, bbox_h, theta = rect  # data_transform后四个点的最小外接矩形的 x_center y_center , w ,h ,theta 角度
# print(theta)
radius = gaussian_radius((math.ceil(bbox_h), math.ceil(bbox_w)))  # 计算高斯核半径
radius = max(0, int(radius))  # 保证>=0
ct = np.asarray([cen_x, cen_y], dtype=np.float32)  # ct float 中心点
ct_int = ct.astype(np.int32)  # ct int 中心点
# generate gt heatmap
draw_umich_gaussian(hm[annotation['cat'][k]], ct_int, radius)  # 所传参数为H*W, (center_x,center_y),radius
ind[k] = ct_int[1] * image_w + ct_int[0]                      
# generate wh gt
wh[k, 0:2] = 1. * bbox_w, 1. * bbox_h
# generate reg gt offset_x&y
reg[k] = ct - ct_int  # 记录回归的center offset_x offset_y
reg_mask[k] = 1  # 类别mask 置为1
assert 0 >= theta >= -90, "error angle"
# generate angle
ang[k] = theta
# so in getitem 
# ---- 省略
image = self.load_image(index)
        # opencv格式 numpy
image_h, image_w, c = image.shape
elif
    ----
elif self.phase == 'train':
    img_id = self.img_ids[index]
    annotation = self.load_annotation(index)
    data_dict = self.generate_ground_truth_base(image, annotation)
return data_dict
heat = pr_decs['hm']                            # N*C*H*W # center*class*h*w
wh = pr_decs['wh']                              # N*2*H*W  w h
reg = pr_decs['reg']                            # N*2*H*W (offset x ,y)
cls_theta = pr_decs['ang']                      # N*1*H*W  表示是否是水平或者旋转框
batch, c, height, width = heat.size()
heat = self._nms(heat)                           # 经过NMS找出极大值点,没有最大分类的点变为0 # N*C*H*W
# print("heat",heat.shape)
scores, inds, clses, ys, xs = self._topk(heat)  # batch*K
reg = self._tranpose_and_gather_feat(reg, inds)
# print("reg_shape", reg.shape) # N*k*2
reg = reg.view(batch, self.K, 2)  # list 进行形状的reshape
xs = xs.view(batch, self.K, 1) + reg[:, :, 0:1]
ys = ys.view(batch, self.K, 1) + reg[:, :, 1:2]
clses = clses.view(batch, self.K, 1).float()
scores = scores.view(batch, self.K, 1)
wh = self._tranpose_and_gather_feat(wh, inds)
wh = wh.view(batch, self.K, 2)  # list 进行形状的reshape
# add
ang = self._tranpose_and_gather_feat(cls_theta, inds)
ang = ang.view(batch, self.K, 1)
detections = torch.cat([xs,  # cen_x
                        ys,  # cen_y
                        wh,
                        ang,
                        scores,
                        clses],
                        dim=2)
index = (scores > self.conf_thresh).squeeze(0).squeeze(1)  # 此处的conference决定检测出来的框保留的置信度
detections = detections[:, index, :]  # 返回的为1*M*12 ,此处不支持多张图同时进行
# print("decode base shape", detections.shape )
return detections.data.cpu().numpy()
class LossBase(torch.nn.Module):
    def __init__(self):
        super(LossBase, self).__init__()
        self.L_hm = FocalLoss()          # 类别
        self.L_wh = OffSmoothL1Loss()    # box parameter
        self.L_off = OffSmoothL1Loss()   # center offset
        self.L_ang = OffSmoothL1Loss()     # theta

    def forward(self, pr_decs, gt_batch):
        hm_loss = self.L_hm(pr_decs['hm'], gt_batch['hm'])
        wh_loss = self.L_wh(pr_decs['wh'], gt_batch['reg_mask'], gt_batch['ind'], gt_batch['wh'])

        off_loss = self.L_off(pr_decs['reg'], gt_batch['reg_mask'], gt_batch['ind'], gt_batch['reg'])
        # change
        pr_decs['ang'] = torch.clamp(pr_decs['ang'], min=-90., max=0)

        ang_loss = self.L_ang(pr_decs['ang'], gt_batch['reg_mask'], gt_batch['ind'], gt_batch['ang'])
        if isnan(hm_loss) or isnan(wh_loss) or isnan(off_loss):
            print('hm loss is {}'.format(hm_loss))
            print('wh loss is {}'.format(wh_loss))
            print('off loss is {}'.format(off_loss))

        loss = hm_loss + wh_loss + off_loss + 0.1*ang_loss
        return loss
byfate commented 3 years ago

I check my code several times,and follow your adivce(ge, weight loss cv2.points),would your please help me when you are convenient?Hope you can point out my problem if you are convenient,appreciate

yijingru commented 3 years ago

Wait, you get 77.8 on DOTA's testing set? Input is 0.5 and 1 scale with 600x600 crop?

byfate commented 3 years ago

Wait, you get 77.8 on DOTA's testing set? Input is 0.5 and 1 scale with 600x600 crop?

Sorry,its HRSC dataset,I didn't mention it .

yijingru commented 3 years ago

Oh, I see. The batch size would matter, I used about 100 epochs for HRSC2016. The codes look good to me. I think the current mAP means your code is good too.

byfate commented 3 years ago

OK, looks like that I haveto rent some gpu to see if the map can be higher.when the result come out, I will tell you .thanks for your replying.

yijingru commented 3 years ago

No problem.

byfate commented 3 years ago

No problem.

when I change the baseline method's learning rate as *1.0e-4,the batch size is 12 ,using two 2080ti. the highest map** is 83.6,but it is not stable during 100 epoch,and I observe the confidece of the box, lots of box confidence are not high.

what is really strange is when I run the baseline method using 1 Tesla V100 ,learning rate *1.25e-4,batch-size 20*the hightest map is about 77.7%(only in one card which has 32gm video card),and when I change the batch-size to 12 ,learning rate 1.0e-4,its highest map is 83.7%,also the traing process is not stable,it's hard to get the highest map depending on batch-size and learning rate.

It seems that I need to adjust different learning rate and batch-size during traing. In all,thanks for you kind replying.appreciated.

yijingru commented 3 years ago

Thank you for your report. I'm thinking maybe the Cuda or library matters, the newest GPU would not be greatly supported. Just guess... But I agree learning rate does matter when the batch size is small in the backpropagation.

byfate commented 3 years ago

Thank you for your report. I'm thinking maybe the Cuda or library matters, the newest GPU would not be greatly supported. Just guess... But I agree learning rate does matter when the batch size is small in the backpropagation.

OK,thank you.