Closed TheSunWillRise closed 4 years ago
Hi, according to the paper the gradients of bias shoul be calculated to get the second term of full gradient but the gradients of middle features are calculated in the code. Are these two situation equal?
I'm glad that some one read this paper. From my view, this bias is not what shown in CNN, the b is per-neuron basis bias. Thus, it cannot be calculated via deriving from CNN's shared b for each kernel. I want some further discussion on this paper, is there any topic or thread on this? I want to open a post on tieba, If you'd like to join
Hi, according to the paper the gradients of bias shoul be calculated to get the second term of full gradient but the gradients of middle features are calculated in the code. Are these two situation equal?