关于源码里面BinarizedF的反传部分的疑惑

muyiyiyi commented 3 years ago

class BinarizedF(Function):
  @staticmethod
  def forward(ctx, input, threshold):
    # 预测pred_map 阈值T 
    ctx.save_for_backward(input,threshold)
    a = torch.ones_like(input).cuda()
    b = torch.zeros_like(input).cuda()
    output = torch.where(input>=threshold,a,b)
    return output  # out = 

  @staticmethod
  def backward(ctx, grad_output):
    # print('grad_output',grad_output)  这个参数默认是全 1  
    input,threshold = ctx.saved_tensors
    grad_input = grad_weight  = None

    if ctx.needs_input_grad[0]:
      grad_input= 0.2*grad_output  # 对pred的梯度
    if ctx.needs_input_grad[1]:
      grad_weight = -grad_output   # 对T的导数是 -1
    return grad_input, grad_weight

作者您好，这篇论文效果很棒，感谢你们的工作。但是对于源码里，BF这个部分。为什么grad_input= 0.2*grad_output。这个0.2是人为设定的吗，还是从哪里推导出来的呢。在论文中，我也没有看的很明白。感谢您的解答。

taohan10200 commented 3 years ago

0.2*grad weight的梯度是传回去更新backbone的参数的，0.2是我们人为设定的，目的是平衡L1损失和L2损失对backbone更新的权重，confidence map的学习应该以L2的监督为主。L1损失在分割结果上计算，起到增加难分割区域权重的作用。如果L1损失占比太大，有的时候会让网络的梯度波动较大，毕竟binary map 与confidence map 相比是不连续的。所以我们在它经过BF传回去的时候降低了它的权重。

muyiyiyi commented 3 years ago

0.2*grad weight的梯度是传回去更新backbone的参数的，0.2是我们人为设定的，目的是平衡L1损失和L2损失对backbone更新的权重，confidence map的学习应该以L2的监督为主。L1损失在分割结果上计算，起到增加难分割区域权重的作用。如果L1损失占比太大，有的时候会让网络的梯度波动较大，毕竟binary map 与confidence map 相比是不连续的。所以我们在它经过BF传回去的时候降低了它的权重。

明白了，谢谢作者的解答！

taohan10200 / IIM

关于源码里面BinarizedF的反传部分的疑惑 #18