scikit-learn-contrib / skglm

Fast and modular sklearn replacement for generalized linear models
http://contrib.scikit-learn.org/skglm
BSD 3-Clause "New" or "Revised" License
151 stars 29 forks source link

Add datafit with log det datafit #265

Open QB3 opened 1 month ago

QB3 commented 1 month ago

Description of the feature

Handle $\log\det $-like datafit and allow variables with multiple roles.

Additional context

@bgirault-inria is currently solving (simpler) graphical-Lasso-like problems and could potentially benefit from our solvers.

The solved problem is $$\min_{w \in \mathbb{R}^p, q \in \mathbb{R}^m} \log \det (\mathrm{diag}(q) + \sum_e w_e b_e b_e^T) + A w + B q + \lambda \sum_e w_e + \sume i{w_e \geq 0} + \sumi i{q_i \geq 0} $$

with given $A, B, b_e$.

This optimization problem raises two main challenges for our current implementation:

bgirault-inria commented 1 month ago

The overall cost optimization problem can be rewritten in the following form to have a single set of variables:

  \min_{\mathbf{x}\in\mathbb{R}^p}
    -\log\det\left(\mathbf{C} + \sum_i x_i \mathbf{a}^{(i)}{\mathbf{a}^{(i)}}^T \right)
    + \sum_i \rho_i x_i
    + \sum_i \mathbb{I}_{x_i\geqslant 0}
  \text{,}

where $\forall i,\rho_i>0$.

Adding an $\ell1$ regularization on a subset of the variables with the term $\lambda \sum{i\in \mathcal{E}} x_i$ is then equivalent to adding $\lambda$ to all parameters $\rho_i$ with $i\in\mathcal{E}$ (where $\mathcal{E}$ is a subset of variable indices), such that the form of the problem is unaltered and we avoid having different roles for the variables (@QB3 2nd bullet point)

Above, the $\log\det$ term is actually a regularization term (penalty), while $\rho_ix_i$ is a data fitting term. If necessary, with a simple variable scaling of $\mathbf{x}$, we can swap the two interpretations, and get an equivalent $\ell_1$ regularized minimization problem:

  \min_{\mathbf{\tilde x}\in\mathbb{R}^p}
    -\log\det\left(\mathbf{C} + \sum_i \tilde x_i \mathbf{\tilde a}^{(i)}{\mathbf{\tilde a}^{(i)}}^T \right)
    + ||\mathbf{\tilde x}||_{1}
    + \sum_i \mathbb{I}_{{\tilde x}_i \geqslant 0}
  \text{.}

Interestingly, with either form of the problem, and a coordinate descent, the corresponding minimization problem has a closed form solution for a single variable update, and does not require computing a gradient (or Hessian, which is basically free to compute from the gradient). However, whether we compute the gradient and Hessian or the closed form optimum, we need to keep in memory a matrix for their computations and efficient updates, and update it each time the solution is updated (@QB3 1st bullet point).

Would this be possible with skglm?

mathurinm commented 1 month ago

@bgirault-inria is the matrix to store the inverse of $C + \sum x_i a_i a_i^\top$ ? That you can update with Shermann-Morrison after a change of a single coordinate of $x$.

Do you have a ref for the exact coordinate update ? because even in dimension 1, I don't see a closed form for $\min - \log(c + a^2 x) + \lambda |x|$ subject to $x \geq 0$, OK you just have to compute the unconstrained solution and if it does not lie in R+, then 0 is the solution

I doubt it directly fits the API of skglm (that is oriented towards glm); but the skglm algorithm (working sets + Anderson acceleration) can definitely be applied to it, and we can code it!

bgirault-inria commented 1 month ago

@bgirault-inria is the matrix to store the inverse of ...?

Yes, it is. Let's denote it $\Phi$. The optimal update is then $x_i \leftarrow \max\left(0, x_i + \rho_i^{-1} - \left(\mathbf{a^{(i)}}^T\Phi\mathbf{a^{(i)}}\right)^{-1}\right)$

References: