prockenschaub / Misc

Miscellaneous functions in R and python
2 stars 3 forks source link

Error in colMeans(as.matrix(imp[[j]]), na.rm = TRUE) : 'x' must be numeric #1

Open yshahppdi opened 4 years ago

yshahppdi commented 4 years ago

Run into this error will calling mice.resuse(testdata) :

Reproducible Example:

sampledata = data.frame( Group1 = rep( c("A","B"),20) , "Group2" = c(rep("C",20),rep("D",20)))
sampledata["Metric"] = ifelse(sampledata$Group1=="A", ifelse(sampledata$Group2=="C", rnorm(1,5,.01), rnorm(1,10,.01)) ,
                              ifelse(sampledata$Group2=="C", rnorm(1,1,.01), rnorm(1,2,.01)) )
sampledata["Metric2"] = sampledata$Metric*rnorm(length(sampledata$Metric),100,sd=15)
sampledata["Metric1_missing"] = sampledata$Metric + rnorm(40,1,.01)
sampledata["Metric2_missing"] = sampledata$Metric2 + rnorm(40,1,.10)
set.seed(1)
sampledata[["Traintest"]] = ifelse(rbinom(20,1,.5)==0,"Train","Test")
set.seed(1)
sampledata[ sample(1:nrow(sampledata),8,replace = F ),"Metric1_missing"]= NA
sampledata[ sample(1:nrow(sampledata),8,replace = F ),"Metric2_missing"]= NA
testdata = sampledata%>%filter(Traintest=='Test') 
sampledata = sampledata%>%filter(Traintest=='Train')

# sampledata = sampledata%>%
#   mutate_at(c("Group1","Group2"),function(x) recode(x,!!!c("A"=1,"B"=2,"C"=1,"D"=2)))

# define imputation methods
impM <- rep("", ncol(sampledata))
names(impM) <- colnames(sampledata)
impM[ "Metric1_missing" ] <- "pmm"
impM[ "Metric2_missing" ] <- "pmm"

# define predictor matrix
predM_d <- 1 - diag( 0, ncol(sampledata))
rownames(predM_d) <- colnames(predM_d) <- colnames(sampledata)
predM_d[c("Group1","Group2","Metric","Metric2","Traintest"), ] <- 0
predM_d[,c("Metric","Metric2","Traintest")] <- 0
diag(predM_d)<- 0 
predM_d

low <- list("Metric1_missing"=sampledata$Metric , "Metric2_missing" = sampledata$Metric2 )
upp <- list("Metric1_missing"=sampledata$Metric, "Metric2_missing" = sampledata$Metric2)

imp <- mice::mice( sampledata, method=impM, predictorMatrix=predM_d, m=5, maxit=3, allow.na=FALSE  ) ##low=low, upp=upp
imputedtest = mice.reuse(imp, testdata, maxit = 1,printFlag = T,seed=12345)

Solution 1) set the variable that are not supposed to be imputed all_miss = FALSE 2) reuse the predictor matrix formula from mids object passed ( not go back to default)

Sample code to fix it:

  all_miss <- matrix(TRUE, rows, cols, dimnames = list(seq_len(rows), nm))
  ## set up the all_miss as "FALSE" features that arenot to be imputed 
  ##c(names(imp$method)[(imp$method)==""]) , would give me all the  name of column that have 
  all_miss[,c(names(mids$method)[(mids$method)==""])] = FALSE

  mids.new <- mice(newdata, m = mids$m, predictorMatrix = mids$predictorMatrix ,method = mids$method,where = all_miss, maxit = 0)
williamty commented 1 year ago

I've tried the sample code @yshahppdi mentioned. But it has no matter to do with all_miss. The error just disappeared after setting maxit to 0. I don't know why. And there's also something weird, all "FALSE" variables (which has no NA and don't need to impute) are 0 in each imputation.