jolars / SLOPE

Sorted L1 Penalized Estimation
https://jolars.github.io/SLOPE
GNU General Public License v3.0
17 stars 10 forks source link

Reconsider penalty scaling for SLOPE #11

Open jolars opened 4 years ago

jolars commented 4 years ago

In SLOPE version 0.3.0 and above, the penalty in the SLOPE objective is scaled depending on the type of scaling that is used in the call to SLOPE(). The behavior is:

There are advantages and disadvantages of doing this kind of scaling, and I think a discussion is warranted regarding what the correct behavior should be.

Pros

Cons

Possible solutions

Whichever way we go with this, I think we should keep the other option available as a toggle, i.e. add an argument along the lines of penalty_scaling to turn off/on penalty scaling, or even to provide a more fine-grained type of penalty scaling. That way, it would be possible to achieve either behavior, which, really, means that this discussion is really about what the default should be.

Thoughts? Ideas?

References

Hastie et al. (2015) mentions that scaling with n is "useful for cross-validation" and makes lambda values comparable for different sizes of samples, but otherwise doesn't seem to mention it.

scikit-learn has a brief article covering these things here: https://scikit-learn.org/stable/auto_examples/svm/plot_svm_scale_c.html

JonasWallin commented 4 years ago

As default I would use the same as glmnet? I agree that it should def be an option. Could you put in some references to what people are doing at different places?

jolars commented 4 years ago

As default I would use the same as glmnet? I agree that it should def be an option. Could you put in some references to what people are doing at different places?

I updated the post with a couple of references, but I'm having a hard time finding references on this.

JonasWallin commented 4 years ago

Could you start an overleaf of this also? We should write down the equations so one can have clearer disccusion about them. Further the naming should be on the scaling not loss function, in my oponion. I.e. 'l1' should be 'none', then if we have 'l1' loss implemented we should say that default there is none?

jolars commented 4 years ago

Could you start an overleaf of this also? We should write down the equations so one can have clearer disccusion about them.

Yes, absolutely.

Further the naming should be on the scaling not loss function, in my oponion. I.e. 'l1' should be 'none', then if we have 'l1' loss implemented we should say that default there is none?

not exactly sure what you mean here

JonasWallin commented 3 years ago

not exactly sure what you mean here

scaling = "l1", no scaling is applied. The scaling is should not be named after lose function so rather. scaling = 'none'.