pzivich / Delicatessen

Delicatessen: the Python one-stop sandwich (variance) shop 🥪
https://deli.readthedocs.io/en/latest/index.html
MIT License
22 stars 2 forks source link

Twice differentiable LASSO #36

Open pzivich opened 4 months ago

pzivich commented 4 months ago

Is your feature request related to a problem? Please describe.

The current LASSO technically is invalid with the sandwich. That is because the score is not differentiable at zero. There are alternatives penalties, that mimic LASSO, where the score is differentiable everywhere. These might be better to include as a sort of default option (other LASSO will be maintained, but would be good to have a valid version as the default).

Describe the solution you'd like

There are various different penalties that could be added, beside the bridge penalty. For LASSO specifically, the dLASSO is one option

https://arxiv.org/pdf/1609.04985.pdf

The only issue is that this penalty will be biased. Some SCAD-dLASSO hybrid would be ideal, as it would solve both the twice differentiable and unbiased part of the penalization. I haven't seen this proposed, so would either need to find such a paper or I would need to put out some methods paper on it first.

Describe alternatives you've considered

Leave as-is. Don't add anymore penalized regression.

Additional context

None

pzivich commented 4 months ago

More on SCAD and what the penalty looks like

https://andrewcharlesjones.github.io/journal/scad.html

pzivich commented 4 months ago

Below is an experimental example of dLASSO

import numpy as np
import pandas as pd
from delicatessen import MEstimator
from delicatessen.estimating_equations import ee_regression, ee_ridge_regression
from delicatessen.utilities import standard_normal_cdf, standard_normal_pdf

np.random.seed(8509353)
n = 50
d = pd.DataFrame()
d['X1'] = np.random.normal(size=n)
d['X2'] = np.random.normal(size=n)
d['X3'] = np.random.normal(size=n)
d['X4'] = np.random.normal(size=n)
d['X5'] = np.random.normal(size=n)
d['X6'] = np.random.normal(size=n)
d['X7'] = np.random.normal(size=n)
d['X8'] = np.random.normal(size=n)
d['X9'] = np.random.normal(size=n)
d['Y'] = 5 + d['X1'] - 0.5*d['X2'] + np.random.normal(size=n)
d['I'] = 1
x_cols = ['I', 'X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9']

def psi(theta):
    beta = np.asarray(theta)[:, None]
    y = np.asarray(d['Y'])[:, None]
    X = np.asarray(d[x_cols])

    pred_y = np.dot(X, beta)
    s = 1e-5
    penalty = np.asarray([0., ] + [5., ]*9) / 50
    penalty_terms = penalty[:, None] * (2*standard_normal_cdf(beta/s) + 2*(beta/s)*standard_normal_pdf(beta/s) - 1)

    ee_reg = (((y - pred_y) * X).T - penalty_terms)
    return ee_reg

estr = MEstimator(psi, init=[0., 1., -0.5, ] + [0., ]*7)
estr.estimate(deriv_method='exact')

def psi(theta):
    return ee_ridge_regression(theta=theta, y=d['Y'], X=d[x_cols], model='linear', penalty=[0, ] + [5., ]*9)

estr = MEstimator(psi, init=[0., ]*len(x_cols))
estr.estimate(deriv_method='exact')

I still need to understand the operating characteristics better.

There is also a dSCAD that can be implemented.

pzivich commented 3 months ago

Okay, I was not too excited about the dLASSO but I am coming around a bit. I think it solves some of the issues of the standard LASSO in this context. My continuing hesitation is that there is a single paper on this, and I'm not sure how well vetted the theory is.

It is probably fine, but I am going to have to study it more myself...