jacobbien / trac

Tree-based Aggregation of Compositional Data
6 stars 3 forks source link

cannot fit for only a single value for lambda #9

Open mmp3 opened 3 years ago

mmp3 commented 3 years ago

Suppose I have already chosen a penalization parameter lambda, and I want to fit trac() on a dataset with only that value of lambda .

This situation arises if I want to perform the cross-validation for selection of regularization parameter lambda outside of trac(), for instance.

Let's suppose I want to use lambda = 0.15. Following the vignette, the following attempt causes an error in c-lasso:

# from the vignette:
library(trac)
names(sCD14)
set.seed(123)
ntot <- length(sCD14$y)
n <- round(2/3 * ntot)
tr <- sample(ntot, n)
log_pseudo <- function(x, pseudo_count = 1) log(x + pseudo_count)
ytr <- sCD14$y[tr]
yte <- sCD14$y[-tr]
ztr <- log_pseudo(sCD14$x[tr, ])
zte <- log_pseudo(sCD14$x[-tr, ])

# use fraclist to select a single value for lambda.
fit <- trac(ztr, ytr, A = sCD14$A, fraclist = 0.15 ) # lambda = 0.15 is my favorite.

The error message is:

Error in py_call_impl(callable, dots$args, dots$keywords) :
  TypeError: 'float' object is not subscriptable

Detailed traceback:
  File "/data1/packages/python_3.8_venv_20210618/lib/python3.8/site-packages/classo/solver.py", line 133, in solve
    self.solution.PATH = solution_PATH(
  File "/data1/packages/python_3.8_venv_20210618/lib/python3.8/site-packages/classo/solver.py", line 845, in __init__
    out = pathlasso(
  File "/data1/packages/python_3.8_venv_20210618/lib/python3.8/site-packages/classo/compact_func.py", line 208, in pathlasso
    if lambdas[0] < lambdas[-1]:

It looks like it fails because c-lasso requires at least two values of lambda in order to work.

In support of that hypothesis, the same error message is returned if I try to execute trac() as in the vignette but restricting nlam=1:

fit <- trac(ztr, ytr, A = sCD14$A, min_frac = 1e-2, nlam = 1)
mmp3 commented 3 years ago

A simple workaround is to create a dummy fraclist and then just select what you want.

The following works with the vignette example from above:

my_lambda <- 0.15
fraclist <- c( my_lambda , my_lambda * 0.99)
fit <- trac(Z = log(mat+1) , y = y , A = A , fraclist = fraclist )

It turns out that fraclist must be in descending order. Otherwise, it will throw an error. This requirement is not documented.

muellsen commented 3 years ago

@mmp3 Thanks for raising this issue! Indeed, we did not beta-test this case, and we will fix it. The issue stems from the fact you point out in c-lasso where we use a path algorithm to solve the underlying optimization problem. It is, of course, possible to solve for a single lambda, and we can include this option. In terms of speed, your workaround above should be fine since the path algorithm will, at least in the sparse part of the aggregation path, be as fast in returning all values as returning a single value. We'll update this issue once, single-lambda trac computations are possible.

mmp3 commented 3 years ago

OK, thank you for confirming this, @muellsen. I was concerned that perhaps I misunderstood the model or the implementation.

And thank you for the great model and package!