Closed dantp-ai closed 2 years ago
This finding is really interesting! But actually the special and careful initialization has no influence on the performance.....
Evidence for your claim?
I just tried to run the experiment without this initialization and nothing changes. Here is my report for this environment, you can have a look if interested. https://github.com/ZeratuuLL/Reinforcement-Learning/blob/master/Continuous%20Control/Report_Reacher.pdf
Linked to PR #15
fan_in = layer.weight.data.size()[0]
. This is wrong, because fan-in is defined as the maximum number of input units to the layer. The weight matrix is transposed (!), thus we need to access the second component of the size, i.e.fan_in = layer.weight.data.size()[1]
See example of correct implementation using fan-in here: https://pytorch.org/docs/stable/_modules/torch/nn/init.html#kaiming_normal_ specifically
def _calculate_fan_in_and_fan_out(tensor)