XCMS-fillChromPeaks: more options for the remaining NA

workflow4metabolomics / tools-metabolomics

Galaxy tools for metabolomics maintained by Workflow4Metabolomics

https://workflow4metabolomics.org/

GNU General Public License v3.0

24 stars 25 forks source link

XCMS-fillChromPeaks: more options for the remaining NA #141

Open melpetera opened 4 years ago

melpetera commented 4 years ago

Hi there,

Here is a suggestion concerning XCMS step, on how to deal with NA that stay NA even after fillChromPeaks. Currently, we have the option to leave this NA as 'NA' or to convert them into '0'. The idea would be to provide a third choice that provide a controled random value instead of 0.

This random value provided to replace the NA could be define as an integer randomly selected between inf.range et sup.range where:

inf.range is the minimum of the random range, specified by the user (with default to 0), coded between 0 and 1, with 0 meaning 0 and otherwise the proportion of the ion minimum value. Example: if the minimum value of a given ion is 1500, if the user set inf.range to 0.5, then the minimum of the random range will be 750.
sup.range is the maximum of the random range, specified by the user also (with default to 1), coded the same way as inf.range. Example: setting it to 0.5 means that the maximum of random range would be 750 if the minimum value of a given ion is 1500.

Note: since it is based on random, it is necessary to provide a "seed" option if needed by the user to obtain similar result if re-run.

@lecorguille do not hesitate to ask if this request is not clear!

Have a nice day, @jfrancoismartin and @melpetera

melpetera commented 4 years ago

Efficient code from @jfrancoismartin

Note:

function used to convert NA by random value in an interval between 0 and min non NA value using runif; this function is called using apply to execute for all columns of dataMatrix
idm = given_ion
the example is a call for a dataMatrix DM; NA replaced by random value in an interval between 0 and min for each var (columns)

imputNA <- function(idm,inf.range,sup.range) { ` if (anyNA(idm)) { nbNA <- sum(is.na(idm)) minVal <- min(idm[!is.na(idm)]) idm[is.na(idm)] <- runif(nbNA,min=inf.rangemin(idm,na.rm=TRUE),max=sup.rangemin(idm,na.rm=TRUE)) return(idm) } } DM <- apply(X = DM ,MARGIN = 2, imputNA)`

lecorguille commented 4 years ago

Hum, for me, it's typically something that should be integrated to the xcms main package : https://github.com/sneumann/xcms What do you think about that?

jfrancoismartin commented 4 years ago

hum hum...actually, xcms fillpeaks try to replace NA by a value in extracted from the raw MS file. It is an analytical replacement. If fillpeaks can't find a value then it becomes a statistical issue not in the field of xcms. And we can propose these kind of NA imputation which is more elegant than just 0 replacement.

lecorguille commented 4 years ago

I guess that one of the purposes of xcms is to become an input for statistic analysis. So it could a xcms issue :)

What do you think about that @sneumann and @jorainer?

My idea is to reduce the code in the wrapper. If it's not something that is interesting to add in XCMS, we should add this code in our future utils R package?

jorainer commented 4 years ago

Note, xcms has already some imputation functionality: xcms::imputeRowMinRand and xcms::imputeRowMin. Nothing spectacular though.