Open fishfishson opened 4 years ago
Hi. In your paper the lemma 1.6 is talking about the distribution of X and Y=sin(pi/2X). We have known that the output of linear layer is in normal distribution if taking specific initialization method uniform(-c,c). However, in your code, the activation is just torch.sin(x) not torch.sin(pi/2 x). Is there something I missed?
Same question, do you have any idea now?
Hi. In your paper the lemma 1.6 is talking about the distribution of X and Y=sin(pi/2X). We have known that the output of linear layer is in normal distribution if taking specific initialization method uniform(-c,c). However, in your code, the activation is just torch.sin(x) not torch.sin(pi/2 x). Is there something I missed?