Predict on dummyVars cannot return a sparse matrix #671

Open jeffwong-nflx opened 7 years ago

jeffwong-nflx commented 7 years ago

Hi, I want to construct a dummyVars matrix in sparse format. Due to my factor variables having a large amount of levels a sparse matrix in the end could really help. Here is a reproducible example


n = 1000
X = data.table(a = sample(c('a', 'b', 'c'), n, replace = T),
               b = sample(c('d', 'e', 'f'), n, replace = T),
               x = 1:n)
X$a = as.factor(X$a)
X$b = as.factor(X$b)

predict(foo <- dummyVars(~ a + b, X, sparse = TRUE), 
             X, sparse = TRUE)

I believe it would be a simple modification here

This line generates a model.matrix, and we would simply need to allow passing sparse = TRUE and use sparse.model.matrix

jeffwong-nflx commented 7 years ago

I have attempted to write a solution for this in myfunc where I rely on sparse.model.matrix


myfunc <- function(object, newdata, na.action = na.pass, return_sparse = FALSE, ...) {
  if(is.null(newdata)) stop("newdata must be supplied")
  if(! newdata <-
  if(!all(object$vars %in% names(newdata))) stop(
          paste("'", object$vars[object$vars %in% names(newdata)],
                "'", sep = "",
                collapse = ", "),
          "are not in newdata"))
  Terms <- object$terms
  Terms <- delete.response(Terms)
  if(!object$fullRank) {
    oldContr <- options("contrasts")$contrasts
    newContr <- oldContr
    newContr["unordered"] <- "contr.ltfr"
    options(contrasts = newContr)
      options(contrasts = oldContr)
  m <- model.frame(Terms, newdata, na.action = na.action, xlev = object$lvls)

  if (return_sparse) {
    x = sparse.model.matrix(Terms, m)
  } else {
    x = model.matrix(Terms, m)  

  if(object$levelsOnly) {
    for(i in object$facVars) {
      for(j in object$lvls[[i]]) {
        from_text <- paste0(i, j)
        colnames(x) <- gsub(from_text, j, colnames(x), fixed = TRUE)
  if(!is.null(object$sep) & !object$levelsOnly) {
    for(i in object$facVars[order(-nchar(object$facVars))]) {
      for(j in object$lvls[[i]]) {
        from_text <- paste0(i, j)
        to_text <- paste(i, j, sep = object$sep)
        colnames(x) <- gsub(from_text, to_text, colnames(x), fixed = TRUE)
  x[, colnames(x) != "(Intercept)", drop = FALSE]

n = 1000
X = data.frame(a = sample(c('a', 'b', 'c'), n, replace = T),
               b = sample(c('d', 'e', 'f'), n, replace = T),
               x = 1:n)
X$a = as.factor(X$a)
X$b = as.factor(X$b)

sparse.model.matrix(~ a, X)
foo <- dummyVars(~ ., X, sparse = TRUE, fullRank = FALSE)
bar <- myfunc(foo, X, return_sparse = TRUE, sparse = TRUE)

However I get this error

Error in model.spmatrix(t, data, transpose = transpose, drop.unused.levels = drop.unused.levels,  : 
  no slot of name "i" for this object of class "dgeMatrix"

I believe it may be related to switching the contrasts option to contr.ltfr, which may not be compatible with sparse.model.matrix? If I comment that block out the code will execute and return a sparse matrix, although it will return one where one level of each factor is dropped (R default)

topepo commented 7 years ago

I have the same (nebulous) issue. Would an object generated by model.Matrix work for you?

jeffwong-nflx commented 7 years ago

yes I believe that would work. did you have a workaround?

topepo commented 7 years ago
x <- model.matrix(Terms, m)

could easily be changed to

x <- if (sparse)
    model.matrix(Terms, m)
  sparse.model.matrix(Terms, m)

More testing would be needed though.

jeffwong-nflx commented 7 years ago

invoking model.Matrix from MatrixModels does not work either

topepo commented 7 years ago

Your example worked for me:

Give the code that I'm about to check in a try

jeffwong-nflx commented 7 years ago

You have a bug here

When sparse is true, it uses model.matrix, not sparse.model.matrix. The output of the example is dense

jeffwong-nflx commented 7 years ago

Bumping this issue :)