Open knightyxp opened 2 years ago
Thanks for your attention, I am sorry that I just see this issue. if lowing the initial lr doesn't work. I suggest that you can try to change the activation function in the last layer of BN Mudule. I had changed the compressed sigmoid function to a linear function with limiting the output to [0.25,0.9], it seems that the NAN will no longer appear.
Hi han tao: I just tried a linear function, I wonder whether the parameter of the linear active function is learnable(the weight and the bias), actually, when I tried an unlearnable limitation linear function with the output value limited in [0.25,0.9] do not work, the value of the threshold appears to be too small even I magnify the threshold 1000 times.
the linear function is like this
`` class linear_limitation(nn.Module): def init(self, para=0.75, bias=0.15): super(linear_limitation, self).init()
#self.weight = Parameter(torch.Tensor(self.x.shape, self.x.shape))
#self.bias = Parameter(torch.Tensor(self.x.shape))
#self.reset_parameters()
def reset_parameters(self):
init.kaiming_uniform_(self.weight, a=math.sqrt(5))
if self.bias is not None:
fan_in, _ = init._calculate_fan_in_and_fan_out(self.weight)
bound = 1 / math.sqrt(fan_in)
init.uniform_(self.bias, -bound, bound)
def forward(self, x):
zero = torch.zeros_like(x)
#output = x.matmul(self.weight)+ self.bias
output = torch.where(x>0.09, zero, x)
output = torch.where(output < 0.025, zero, output)
return output
`` the threshold value before the activate function be like this
Hi tao han: I am a graduate student in SEU, trying to replace the backbone of IIM (VGG16_FPN or HRNet) to my Transformer crowd counting model. However, even I low the initial lr 2 1e-6 to 1e-7 in SHA. the threshold even appears to be NAN in the 700 epoch. Also, the best MAE is only 126, which is far away from my model combined with other losses (more than MSE) on SHA. I noticed that in this link https://github.com/taohan10200/IIM/issues/7#issuecomment-766274210 you have mentioned that we also could lower the initial threshold, I wonder to sure whether is the initial weight 0.5 in the Binarized module. But even I change the initial weight to 0.4, the t_max also starts with 0.54. I get confused with the Binarized module. looking forward to your reply, my email is knightyxp@gmail.com/ 220192629@seu.edu.cn