Open fecet opened 4 years ago
可以缓解, 但不一定能得到理想的效果, 增大学习率可能使最优值被跨越, 也可能造成梯度爆炸.
Why can't we handle vanishing gradient problem in neural nets using large step sizes?
不能,导数值都是0,乘以任何学习率值都是0
可以缓解, 但不一定能得到理想的效果, 增大学习率可能使最优值被跨越, 也可能造成梯度爆炸.
Why can't we handle vanishing gradient problem in neural nets using large step sizes?