Hi, yy,
I was wondering why you choose
xavier_normal_(self.weight, gain=2, mode='out')
instead of
nn.init.xavier_uniform_(self.weight)
when initializing weights. And by looking through the xaviernormal function, I found the weights won't participate in gradient propagation, why?
Hi, yy, I was wondering why you choose
xavier_normal_(self.weight, gain=2, mode='out')
instead ofnn.init.xavier_uniform_(self.weight)
when initializing weights. And by looking through the xaviernormal function, I found the weights won't participate in gradient propagation, why?Thanks for your sharing, and have a nice day :)