pneuvial / c3co

Inferring cancer cell clonality from copy-number data
5 stars 1 forks source link

Argument 'type' in call to 'lsei' in get.W: type= 1 calls LINPACK solver; type = 2 calls QP solver with unexpected ridge regularization #58

Closed jchiquet closed 5 years ago

jchiquet commented 5 years ago

I don't know why the QP solver was chosen at some point, but it seems significantly slower:

library(limSolve)

# problem dimension
J <- 11L  ## Number of segments
K <- 4L   ## Number of subclones
n <- 4L   ## Number of samples

# data generation
Z <- matrix(1, nrow = K, ncol = J)
Z[2,    2] <- 2
Z[3,  5:6] <- 2
Z[4, 9:10] <- 2
Zt <- t(Z)

W <- matrix(0, nrow = n, ncol = K)
W[1,1] <- W[2,2] <- W[3,3] <- W[4,4] <- 1
Y <- W %*% Z + matrix(rnorm(n*J, sd = 0.1), nrow = n, ncol = J)

# problem resolution
E <- matrix(rep(1, times = K), nrow = 1L, ncol = K)
H <- double(K)
G <- diag(K)

res <- microbenchmark::microbenchmark(
  lsei = apply(Y, MARGIN = 1L, FUN = function(y) {
    lsei(A = Zt, B = y, E = E, F = 1L, H = H, G = G, type = 1L)$X
  }),
  quadprog = apply(Y, MARGIN = 1L, FUN = function(y) {
    lsei(A = Zt, B = y, E = E, F = 1L, H = H, G = G, type = 2L)$X
  })
)

Then it gives

> summary(res)
      expr     min       lq     mean  median       uq      max neval
1     lsei 290.643 301.1630 328.2604 315.252 318.8810 1828.098   100
2 quadprog 363.585 376.2815 405.4294 389.690 395.7745 2259.158   100

So I suggest changing type = 2L to 1L in get.W

jchiquet commented 5 years ago

More about this : in the lsei function, a call to the second type induces a ridge-like regularization, which may explain why this solution was found more stable by @mpierrejean in commit ce6c2f72e78099329cb421d6ee7d1d74e17031ed

   else if (type == 2) {
        if (!is.null(Wx)) 
            stop("cannot solve least squares problem - weights not implemented for type 2")
        if (!is.null(Wa)) 
            stop("cannot solve least squares problem - weights not implemented for type 2")
        dvec <- crossprod(A, B)
        Dmat <- crossprod(A, A)
        diag(Dmat) <- diag(Dmat) + 1e-08
        Amat <- t(rbind(E, G))
        bvec <- c(F, H)
        sol <- solve.QP(Dmat, dvec, Amat, bvec, meq = Neq)
        sol$IsError <- FALSE
        sol$X <- sol$solution
    }

However, it does not solve the expected problem !

jchiquet commented 5 years ago

This problem is related to the rank deficiency problem: we suggest to add an argument to get.W for the type of solver use, using type = 1L as a default since it corresponds to the expected optimization problem.

Later, we might tweak the following to call quadprog by ourselves and then control the amount of ridge regularization (here arbitrarily set to 1e-8):

else if (type == 2) {
        if (!is.null(Wx)) 
            stop("cannot solve least squares problem - weights not implemented for type 2")
        if (!is.null(Wa)) 
            stop("cannot solve least squares problem - weights not implemented for type 2")
        dvec <- crossprod(A, B)
        Dmat <- crossprod(A, A)
        diag(Dmat) <- diag(Dmat) + 1e-08
        Amat <- t(rbind(E, G))
        bvec <- c(F, H)
        sol <- solve.QP(Dmat, dvec, Amat, bvec, meq = Neq)
        sol$IsError <- FALSE
        sol$X <- sol$solution
    }

UPDATE 2019-03-02: Moved to Issue #72.

HenrikBengtsson commented 5 years ago

Don't forget to update NEWS - this glooks like a 'SIGNIFICANT CHANGES:' entry since we're changing the default from type=2L to type=1L.

HenrikBengtsson commented 5 years ago

I've moved the outstanding future task to new Issue #72.