Open pythonometrist opened 3 years ago
Do you have an elementwise l1 penalty? If so, you can use operators.prox_soft
. adaprox
should work then.
Thanks- let me dig into it and revert. I am going to evaluate how this compares with a smooth Huber loss for linear regression.
This isnt an issue per se. I did want to figure out if I could use a similar approach for a simple LASSO regression in pytorch. Working with proximal operators with SGD is straightforward (but then SGD has step size issues). ADAM requires memory for past gradients - but isn't meant for non-differentiable convex problems (even though L1 regularization does improve results a fair bit). I wanted tos ee if AdaProx improves results.