Open jiatongli1997 opened 3 years ago
I have a problem about your project. If you initialize the last layer as zeros, the gradients will also be zeros for all layers. Then the model cannot be trained. Do i understand correctly?
Hi, I understand what you mean. When the last layer is set as zeros, the gradients before the last layer will be zero. However, the gradient for the last layer is not zero, so the last layer will be optimized firstly. Other layers will be optimized later.
I have a problem about your project. If you initialize the last layer as zeros, the gradients will also be zeros for all layers. Then the model cannot be trained. Do i understand correctly?