Public: Can scale the kernel initialisers in MCSoftmaxDenseFA.
The main use case is to be able to scale down, at initialisation, the magnitude of the covariance. In particular, this is useful to have a fine-tuning procedure that can imitate a Dense layer in the first few steps.
Public: Can scale the kernel initialisers in
MCSoftmaxDenseFA
.The main use case is to be able to scale down, at initialisation, the magnitude of the covariance. In particular, this is useful to have a fine-tuning procedure that can imitate a Dense layer in the first few steps.