Closed iamhankai closed 3 years ago
What's the Norm function in Eq.(3)? LayerNorm?
We use batch norm for the experiments in the paper.
Thanks a lot.
Another question: for the input BxHxNxN
, which dim is BN performed along ?
Another question: for the input
BxHxNxN
, which dim is BN performed along ?
Hi, it is applied along the batch dimension.
nn.BatchNorm2d(num_features=H)
or nn.BatchNorm2d(num_features=N)
?
Good work!What about the initialization of the HxH
matrix?Is torch.eye
or torch.randn
?
nn.BatchNorm2d(num_features=H)
ornn.BatchNorm2d(num_features=N)
?
Hi, should be this one: nn.BatchNorm2d(num_features=H)
Good work!What about the initialization of the
HxH
matrix?Istorch.eye
ortorch.randn
?
Hi,
Thanks for your interest. This is a good question, I try both the eye init and random init and the results are similar. For the experiments in the paper, the results are based on random init. I am considering add in a set of experiments regarding the initialization into the paper also.
Best regards, Zhou Daquan
Thank you very much.
What's the Norm function in Eq.(3)? LayerNorm?