Open chanind opened 2 weeks ago
Hi @chanind!
I've created PR #376 that implements one of the alternatives you suggested - adding a separate l0_lambda
parameter specifically for JumpReLU training.
The PR:
l0_lambda
parameter (default: 0.0)l0_lambda
specification when using JumpReLUWould love to get your thoughts on this implementation!
Proposal
Now that we support JumpReLU training, the
l1_coefficient
is confusing since jumprelu use a l0 loss, not l1, for training. We should rename this parameter tosparsity_coefficient
since it is a coefficient used to generally promote sparsity. We should also renamel1_warmup_steps
tosparsity_warmup_steps
.Motivation
It is confusing to see
l1_coefficient
used for JumpReLU training which doesn't use L1 loss.Alternatives
Alternatively, we could add a separate
l0_coefficient
/l0_warmup_steps
which is only used for jumprelu training and error ifl1_coefficient
is provided. This would also potentially allow training a jumprelu with both L0 and L1 loss if desired.Checklist