Open AlinaMH opened 1 month ago
Hi Alina,
The training and validation data can have different sample sizes but otherwise they need to have the same variables. Both trainingData and valiData should include age_group, Fusion, and the list of variables in c(1:length(ma))]
Is MRD_cat an ordered factor? I was unclear because of your statement, “Where the last three columns of training data are the ordinal response value and the two categorical factors. "ma" refers to a vector of genes I am giving as input.” If your response is an ordered factor, I am wondering if it is a problem with my predict function not structuring a factor variable appropriately. That is, if age_group is a K-level categorical variable, the modeling function will dummy code age_group. Maybe my predict function isn’t recognizing that. To check, maybe replace your age_group variable with K-1 dummy variables in both trainingData and valiData (and the same for Fusion if it is a K-level categorical variable) and see if that fixes the problem. If you try this and it works I would like to know as I could alter the predict function and update the package.
Best, Kellie
Hi Kellie, thank you for your super fast reply (and sorry it took me so long to get back to you!). I changed my unpenalized factor "age_group" from ordered to character and then it works. I suppose the code might have a problem if both the response variable as well as an unpenalized predictor are ordered? Unfortunately, I realized my data set is too small to make accurate predictions, but I will be re-trying with a larger data set soon and will let you know if that runs well. Cheers, Alina
Hi, I really like your approach for modelling ordinal data and am trying it on my data. I have gene expression data and ordinal response data as well as categorical unpenalized factors. I have split my cohort into a training cohort and a validation cohort, but whenever I try using the predict function on the validation set, I get the error:
Error in neww %*% zeta : non-conformable arguments
my training function looks like this ordinal_bayes_cat[[v]] <- ordinalbayes(MRD_cat ~ age_group + Fusion, data = trainingData[,c(ma, ncol(trainingData), ncol(trainingData)-1, ncol(trainingData)-2)], x = trainingData[,c(1:length(ma))]) Where the last three columns of training data are the ordinal response value and the two categorical factors. "ma" refers to a vector of genes I am giving as input
My prediction function for the validation set looks like this. The colnames of trainingData and valiData are the same (several genes and the three factor columns:
phat_vali<-predict(ordinal_bayes_cat[[1]], neww = ~ age_group + Fusion, newx = as.matrix(valiData[,c(1:length(ma))]), newdata = as.data.frame(valiData[, c(ma, ncol(valiData), ncol(valiData)-1, ncol(valiData)-2)]) ) Any ideas on why I am getting this error? Some reading I have done suggests that there is a problem with the dimensions of the matrices that are being created in the function call but I cant figure out how to fix it. Is it a problem when the training and to be predicted data have different sample sizes.
Help would be much appreciated. Best regards, Alina