kreutz-lab / DIMAR

Data-driven selction of an imputation algorithm in R
4 stars 0 forks source link

Error with coef for datasets with <1000 rows #4

Closed aretaon closed 2 years ago

aretaon commented 2 years ago

Hi Janine,

we have found another problem with DIMA where I would very much appreciate your help. When using a small data set with <<1000 rows, we get the following error:

Error in coef[i, 1:length(coefficients(fit))] <- stats::coefficients(fit) : object of type 'closure' is not subsettable Calls: dimarLearnPattern Execution halted

For reproducing the error, please find attached a small data file input.csv. The following code snippet was used for testing:

df <- read.table("input.csv", sep='\t', header=TRUE)
rownames(df) <- df$UID
dfv <- df[,-which(names(df) %in% c("UID"))]
mtx <- as.matrix(dfv)

mtx <- DIMAR::dimarMatrixPreparation(mtx, nacut = 2)

dimarLearnPattern <- function(mtx) {
  # Subsample indices
  if (nrow(mtx) > 1000) {
    nsub <- ceiling(nrow(mtx) / 1000)
    indrand <- sample(1:nrow(mtx), nrow(mtx))
    npersub <- ceiling(nrow(mtx) / nsub)
    coef <- matrix(,nrow=nsub,ncol=(npersub+dim(mtx)[2]+2))
  } else {
    nsub <- 1
    ind <- 1:nrow(mtx)
  }

  for (i in 1:nsub) {
    #  i <- 1
    if (nsub > 1) {
      if (i==nsub) {
        ind <- indrand[(npersub*(i-1)+1):length(indrand)]
      } else {
        ind <- indrand[(npersub*(i - 1) + 1):(npersub*i)]
      }
    }
    design <- DIMAR::dimarConstructDesignMatrix(mtx[ind,])
    design <- DIMAR::dimarConstructRegularizationMatrix(design)

    #fit <- stats::glm.fit(X,y,family=stats::binomial(),weights=rep(1,dim(X)[1]))
    fit <- stats::glm.fit(design$X, design$y, family = stats::binomial())
    coef[i,1:length(coefficients(fit))] <- stats::coefficients(fit)
  }
  # sort row coefficients, intensity/column coefficients are set to mean over nsub (for loop)
  if (nsub > 1) {
    idx <- which(design$Xtype!=3)
    coef <- c(colMeans(coef[,idx]), sort(coef[,setdiff(1:dim(coef)[2],idx)]))
  }

  print('Pattern of MVs is learned by logistic regression.')
  return(coef)
}

coef <- dimarLearnPattern(mtx)
print(coef)

I assume the problem lies in not initialising the coef variable in cases with < 1000 rows but I am not sufficiently familiar with DIMA to add the proper matrix myself. I would be great if you could have a look at this.

Cheers

Julian

JanineEgert commented 2 years ago

Dear Julian, thanks for the message! The package is updated accordingly. Update the package by e.g. devtools::install_github("kreutz-lab/DIMAR") and try again. Let me know if it works. Best, Janine