Open yunfanLu opened 2 years ago
class Scale(nn.Module): def __init__(self, init_value=1e-3): super().__init__() self.scale = nn.Parameter(torch.FloatTensor([init_value])) def forward(self, input): return input * self.scale
When the self.scale=1, does this option does nothing? Why do we need this layer?
Is the self.scale learnable parameters 𝜆𝑥 in the paper?
When the self.scale=1, does this option does nothing? Why do we need this layer?