Open ootts opened 2 years ago
Hi @ootts , Apologies for the delayed response. This is simply a hyperparameter that amplifies the gradients of the learnable parameter. The paper merely asserts that this parameter is learnable, implying it can be formulated through any type of learnable function mapping. We just need to find a suitable one.
And which part does it correspond to in the paper? Thanks a lot!