kreutz-lab / DIMAR

Data-driven selction of an imputation algorithm in R
4 stars 0 forks source link

dimarMatrixPreparation.R: nacut filtering #2

Closed aretaon closed 2 years ago

aretaon commented 2 years ago

Hi and thanks for this nice piece of software!

I realised that the use of the nacut variable in https://github.com/kreutz-lab/DIMAR/blob/0c6407ee4ed110b97a0fe9ca4708c3bb56dc4134/R/dimarMatrixPreparation.R differs from the definition of the variable in the params section. E.g. for nacut == 2, features with 2 data points or less are removed (i.e. features with exactly 2 data points are not kept) whereas you state in your params that

@param nacut minimum number of measured data points

Just for clarity, I would suggest changing the comparison operator in your row selection:

    if (nacut >= 0 && nacut < 1) {
        mtx = mtx[rowSums(!is.na(mtx)) >= nacut*ncol(mtx),]
        print(paste("Features with less than",nacut," percent of data points are removed."))
    } else if (nacut >= 1) {
        mtx = mtx[rowSums(!is.na(mtx)) >= nacut,]
        print(paste("Features with less than",nacut,"data points are removed."))
    } else {
        warning(paste('dimarMatrixPreparation: nacut',nacut,'not known. Expand code here. No transformation is performed.'))
    }

Cheers!

Julian

ebrombacher commented 2 years ago

Hi Julian,

Many thanks for your helpful suggestion. I changed the code accordingly.

Best,

Eva