kelliejarcher / ordinalbayes

Bayesian Ordinal Regression for High-Dimensional Data
Other
1 stars 1 forks source link

Error when using predict() with new data: non-conformable arguments #1

Open AlinaMH opened 1 month ago

AlinaMH commented 1 month ago

Hi, I really like your approach for modelling ordinal data and am trying it on my data. I have gene expression data and ordinal response data as well as categorical unpenalized factors. I have split my cohort into a training cohort and a validation cohort, but whenever I try using the predict function on the validation set, I get the error:

Error in neww %*% zeta : non-conformable arguments

my training function looks like this ordinal_bayes_cat[[v]] <- ordinalbayes(MRD_cat ~ age_group + Fusion, data = trainingData[,c(ma, ncol(trainingData), ncol(trainingData)-1, ncol(trainingData)-2)], x = trainingData[,c(1:length(ma))]) Where the last three columns of training data are the ordinal response value and the two categorical factors. "ma" refers to a vector of genes I am giving as input

My prediction function for the validation set looks like this. The colnames of trainingData and valiData are the same (several genes and the three factor columns:

phat_vali<-predict(ordinal_bayes_cat[[1]], neww = ~ age_group + Fusion, newx = as.matrix(valiData[,c(1:length(ma))]), newdata = as.data.frame(valiData[, c(ma, ncol(valiData), ncol(valiData)-1, ncol(valiData)-2)]) ) Any ideas on why I am getting this error? Some reading I have done suggests that there is a problem with the dimensions of the matrices that are being created in the function call but I cant figure out how to fix it. Is it a problem when the training and to be predicted data have different sample sizes.

Help would be much appreciated. Best regards, Alina

kelliejarcher commented 1 month ago

Hi Alina,

The training and validation data can have different sample sizes but otherwise they need to have the same variables. Both trainingData and valiData should include age_group, Fusion, and the list of variables in c(1:length(ma))]

Is MRD_cat an ordered factor? I was unclear because of your statement, “Where the last three columns of training data are the ordinal response value and the two categorical factors. "ma" refers to a vector of genes I am giving as input.” If your response is an ordered factor, I am wondering if it is a problem with my predict function not structuring a factor variable appropriately. That is, if age_group is a K-level categorical variable, the modeling function will dummy code age_group. Maybe my predict function isn’t recognizing that. To check, maybe replace your age_group variable with K-1 dummy variables in both trainingData and valiData (and the same for Fusion if it is a K-level categorical variable) and see if that fixes the problem. If you try this and it works I would like to know as I could alter the predict function and update the package.

Best, Kellie

AlinaMH commented 2 weeks ago

Hi Kellie, thank you for your super fast reply (and sorry it took me so long to get back to you!). I changed my unpenalized factor "age_group" from ordered to character and then it works. I suppose the code might have a problem if both the response variable as well as an unpenalized predictor are ordered? Unfortunately, I realized my data set is too small to make accurate predictions, but I will be re-trying with a larger data set soon and will let you know if that runs well. Cheers, Alina