google / dopamine

Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
https://github.com/google/dopamine
Apache License 2.0
10.42k stars 1.36k forks source link

Score normalization in ReDo #209

Open yycho0108 opened 1 year ago

yycho0108 commented 1 year ago

I've been following the ReDo paper ("The Dormant Neuron Phenomenon in Deep Reinforcement Learning", https://arxiv.org/pdf/2302.12902.pdf) which describes the procedure to determine the dormant neurons in terms of their normalized scores. Attached the snapshot below: image

As far as I can tell, this line implements equation (1) to compute the scores: https://github.com/google/dopamine/blob/a6f414ca01a81e933359a4922965178a40e0f38a/dopamine/labs/redo/weight_recyclers.py#L314 However, my question is that while this code appears to implement the identical mathematical equation, it doesn't seem to match the text description afterwards, which says "We normalize the scores such that they sum to 1 within a layer". In order to implement this logic, shouldn't this line be:

 score /= jnp.sum(score) + 1e-9

instead? I'm not sure how the scores sum to 1 after the normalization scheme as implemented by the original equation and the code.

Thank you!

psc-g commented 1 year ago

hi, apologies for the delay in the response.

you are correct in that it's not actually normalizing in the sense that the values add up to 1. we are in fact scaling the "normalized" values by the number of neurons in the layer.

this may have been an oversight on our end, and we will add a clarifying note to the paper and to this code. however, given that our experiments were run with this setup, we will keep the code as is!

if you decide to correct this and emit properly normalized scores, you will likely have to adjust the threshold as well. do let us know if you find anything interesting!

initial-h commented 3 months ago

Hi @yycho0108 @psc-g , I have some confusion about redo. In the paper, it says, reinitialize their incoming weights and zero out the outgoing weights. I'm confused since in my mind each layer of the network is just a matrix. I'm wondering what are the incoming weights and outgoing weights. Could you give me some hints? Thanks a lot!