LP within `get_lambda_max` (silently) fails converge to a solution

huisaddison commented 1 year ago

Reproducing example:

library(quantgen)

set.seed(1)
n = 100
p = 30
d = get_diff_mat(p, 3)

X = matrix(rnorm(n*p), nrow=n)
y = rnorm(n)

get_lambda_max(X, y, d=d) # Returns 72.50772

get_lambda_max(X, y, d=d, lp_solver='gurobi') # Returns NULL

If one inspects the returned solution object a following the lines https://github.com/ryantibs/quantgen/blob/59718548fd6b327b5f1870f9c86fb079cc500b70/quantgen/R/quantile_genlasso.R#L557-L573 one will find that with GLPK,

Called from: get_lambda_max(X, y, d = d)
Browse[1]> print(a$status)
[1] 1

and with Gurobi,

Called from: get_lambda_max(X, y, d = d, lp_solver = "gurobi")
Browse[1]> print(a$status)
[1] "INFEASIBLE"

The failure is more obvious under Gurobi because the solution will be NULL, whereas with Rglpk the supposed solution is the final iterate before it quits.

Thanks @jeremy-goldwasser for catching this

huisaddison commented 1 year ago

Next steps:

Is the implementation correct? (i.e., are we forming the LP determining lambda_max correctly?)
And if it is correct, do we expect Gurobi/GLPK to fail to find the solution often? What safety logic should we put in / heuristics can we use instead for a default large value of lambda?

huisaddison commented 1 year ago

Just adding some notes to this so we don't forget (last discussed on 22-Dec-2022):

Current implementation is based on a heuristic that we don't have documentation for; and was only ever meant to give an approximate value of lambda_max.
We can replace it with a different kind of heuristic, e.g., start with lambda_max in the least squares case (which can be obtained in closed form), apply an inflation factor to account for the least absolute deviations case. And maybe "explore", doubling lambda_max until we find that the active set is empty (check the empty active set against the KKT conditions)

ryantibs / quantgen

LP within `get_lambda_max` (silently) fails converge to a solution #27