jbloomAus / SAELens

Training Sparse Autoencoders on Language Models
https://jbloomaus.github.io/SAELens/
MIT License
481 stars 127 forks source link

[Proposal] Rename `l1_coefficient` to `sparsity_coefficient` #360

Open chanind opened 2 weeks ago

chanind commented 2 weeks ago

Proposal

Now that we support JumpReLU training, the l1_coefficient is confusing since jumprelu use a l0 loss, not l1, for training. We should rename this parameter to sparsity_coefficient since it is a coefficient used to generally promote sparsity. We should also rename l1_warmup_steps to sparsity_warmup_steps.

Motivation

It is confusing to see l1_coefficient used for JumpReLU training which doesn't use L1 loss.

Alternatives

Alternatively, we could add a separate l0_coefficient / l0_warmup_steps which is only used for jumprelu training and error if l1_coefficient is provided. This would also potentially allow training a jumprelu with both L0 and L1 loss if desired.

Checklist

muyo8692 commented 1 week ago

Hi @chanind! I've created PR #376 that implements one of the alternatives you suggested - adding a separate l0_lambda parameter specifically for JumpReLU training.

The PR:

Would love to get your thoughts on this implementation!