zellerlab / siamcat

R package for Statistical Inference of Associations between Microbial Communities And host phenoType
https://siamcat.embl.de/
51 stars 16 forks source link

Incorrect LOOCV split via create.data.split #6

Closed akg2685 closed 4 years ago

akg2685 commented 4 years ago

The documentation suggests that it may be possible to perform leave one out cross validation (LOOCV) with create.data.split by setting num.folds equal to n, where n is the total number of samples. While the verbose output of the function claims that this is what is happening, the function actually creates n-1 folds and always puts two samples in the first test fold.

Additionally (though I'm not sure how related it is), the siamcat object produced by attempting LOOCV with num.folds = n in create.data.split fails in make.predictions with the following error message:

Error in make.predictions(siamcat) : nrow(data) == length(test.label) is not TRUE

These issues are reproducible using the data and procedure from the SIAMCAT basic vignette.

jakob-wirbel commented 4 years ago

Thanks for catching this! I'm at a conference at the moment, but i will check why this is happening once i'm back