hansenlab / minfi

Devel repository for minfi
58 stars 67 forks source link

estimateCellCounts bug with 1 unique pData variable #175

Open snlent opened 5 years ago

snlent commented 5 years ago

I downloaded some IDAT files from GEO and was preprocessing the methylation data before integrating phenotype data. I got this error with estimateCellCounts():

[estimateCellCounts] Combining user data with reference (flow sorted) data.

Error in `*tmp*`[1, ] : incorrect number of dimensions

I dug into the issue and figured out the cause. When combining the user's RGChannelSet with the reference RGChannelSet, .harmonizeDataFrames() makes a list of variables unique to the user's RGChannelSet phenotype data and a list of variables unique to the reference package RGChannelSet phenotype data, x.only and y.only.

.harmonizeDataFrames <- function(x, y) {
  stopifnot(is(x, "DataFrame"))
  stopifnot(is(y, "DataFrame"))
  x.only <- setdiff(names(x), names(y))
  y.only <- setdiff(names(y), names(x))
  if (length(x.only) > 0) {
    df.add <- x[1, x.only]
    is.na(df.add[1, ]) <- TRUE
    y <- cbind(y, df.add)
  }
  if (length(y.only) > 0) {
    df.add <- y[1, y.only]
    is.na(df.add[1, ]) <- TRUE
    x <- cbind(x, df.add)
  }
  list(x = x, y = y[, names(x)])
}

If either x.only or y.only is of length 1, df.add is a character vector of length 1 instead of a data frame and the function gives the error above. I ended up adding a dummy variable to my pData() object to force it to have 2 unique variables, but the issue could be fixed in the package by replacing df.add <- x[1, x.only] with df.add <- x[1, x.only,drop=FALSE] and df.add <- y[1, y.only] with df.add <- y[1, y.only, drop=FALSE].