timoklein / redo

ReDo: The Dormant Neuron Phenomenon in Deep Reinforcement Learning (pytorch)
25 stars 3 forks source link

a question about redo #5

Closed initial-h closed 7 months ago

initial-h commented 8 months ago

In the paper, it says reinitialize their incoming weights and zero out the outgoing weights. I'm confused since in my mind each layer of the network is just a matrix. I'm wondering what are the incoming weights and outgoing weights. Could you give me some hints? Thanks a lot!

timoklein commented 8 months ago

That's a great question! It took me some while to wrap my head around it too. Basically, think of a neuron as an abstract thing in between two pytorch linear layers. In the diagram below, your linear layers can be seen as defining the edges. neuron

It's very un-intuitive considering how a network is actually implemented :)

initial-h commented 8 months ago

Does this mean, for Layer 1 in this picture the incoming weights are w1 and outgoing weights are w2? If so, will w2 also be considered as the incoming weights for Layer N? Does this mean w1 will be reinitialize because they are the incoming weights of Layer1 and w2 will also be reinitialize because they are the incoming weights of LayerN?

timoklein commented 8 months ago

Does this mean, for Layer 1 in this picture the incoming weights are w1 and outgoing weights are w2?

Yes.

If so, will w2 also be considered as the incoming weights for Layer N?

Yes, $W_2$ will be the ingoing weights for layer 2 and so on.

Does this mean w1 will be reinitialize because they are the incoming weights of Layer1 and w2 will also be reinitialize because they are the incoming weights of LayerN?

As I understood and implemented it: Yes. As the outgoing weights of layer N will be set to 0 in any case, it should not be an issue.

initial-h commented 8 months ago

Will it affect the learning if the weights are set as 0 mannually? Since the weights are updated by BP, will the weights be updated if they are masked or the gradients are changed due to setting the value as 0? This confuses me a lot ...

I get the idea is to reinitialize the weights to make them can be updated and activated again, and also guarantee the output will not be changed due to reinitialization. But I don't get the full picture how to achieve it.

Thanks again.

timoklein commented 8 months ago

Will it affect the learning if the weights are set as 0 mannually?

It will affect learning, but not necessarily in a negative way.

Since the weights are updated by BP, will the weights be updated if they are masked or the gradients are changed due to setting the value as 0?

Yes, they will still be updated because not all the weights in a layer are set to 0. There will still be some gradient flow through the layer, even after a reset. With each gradient step after the reset, more and more neurons will move away from 0 again.

At least, that's my understanding from debugging the code. I hope it helps. These are all really good questions, feel free to ask more!

initial-h commented 8 months ago

Thanks for the reply! I have one more question. I'm curious if the incoming weights will be updated. For example in the picture below, assume the neuron at the bottom of the third layer is dormant, i.e. the value is zero. So the 3 incoming weights are reinitialized and the outgoing weights are masked/set as 0 (if I understand correctly), I'm curious if there are gradients (non-zero gradients) backpropagated to the 3 incoming weights since the outgoing weights are 0? nn

timoklein commented 8 months ago

Hi @initial-h , I'm slow to reply atm due to the ICML rebuttals and some personal issues. If you have answered your question in the meantime, feel free to close the issue. Otherwise, I'll come back to it in a couple of weeks.

initial-h commented 8 months ago

Sure, take your time and no hurry about it.

timoklein commented 7 months ago

Let's look at it in terms of the last layer and the loss because that's what you highlighted.

The DQN loss is basically Screenshot from 2024-04-10 10-58-06

Calculating the gradient w.r.t. $\mathbf w$ yields Screenshot from 2024-04-10 11-03-05

So even if some of the weights are 0, the gradient of the last layer shouldn't be 0. Does that seem right?

initial-h commented 7 months ago

Yeah, that makes sense. Thank you!