jcapelladesto / geoRge

geoRge: a computational tool for stable isotope labelling detection in LC/MS-based untargeted metabolomics
GNU General Public License v3.0
10 stars 6 forks source link

NaN values in X1 matrix #12

Closed arjanab closed 3 years ago

arjanab commented 4 years ago

Hi,

I processed my raw MS data using XCMS (functions xcmsSet, retcor, group) as you describe in the paper. In the first line of PuInc_seeker, xcms::groupval creates a matrix with NA values. This is different from the groupval matrix created using your test data mtbls213 which seems to have no missing values. XCMS documentation say that NA values occur when there is no peak for that specific sample, which doesn't seem surprising to me. These NA values, though, I think cause issues for me when filtsampsint with a lot of NAs is used to select in the Welch's test.

Do you know this to be a problem? If yes how do I go about this?

Any help would be appreciated!

It looks like geoRge is the perfect tool for me project so I really hope there is a solution for this issue.

Thanks in advance! Arjana Begzati

jcapelladesto commented 4 years ago

Hi,

It seems that you did not run the function fillPeaks from XMCS. This funciton, imputes a value for features of samples in which the peakpicking phase did not detect a peak (the cause may be due to a highly differential ion - the most probable in the case of labelling experiments). It is very simple to run: your_xcmsSet <- fillPeaks(your_xcmsSet)

Try to run PuInc_seeker again after filling the peaks.

arjanab commented 4 years ago

I was advised by people who maintain the xcsm package to use their new functionalities (findChromPeaks(), groupChromPeaks(), fillChromPeaks()) instead of the ones listed in the geoRge paper. The new functions result in a XCMSnExp object instead of xcmsSet. Can geoRge's function accept that? When I tried to do use in PuInc_seeker() it didn't work. Neither did it work after I tried to convert it with as(peaks_rtcor_grouped_filled, 'xcmsSet').

jcapelladesto commented 4 years ago

I am sorry but geoRge is currently not compatible with xcms-MsnBase output format.

I am unsure of why coercing (using "as") to xcmsSet fails. Does it mantain the phenoData matrix? Can you extract the feature correspondence matrix using groupval(xcmsSet,"maxo") or using xcmsSet@groups?

I would strongly suggest you that to run geoRge you use the old xcms function format (xcmsSet,retcor,group,fillPeaks).

arjanab commented 4 years ago

Thank you, I will just use the old xcms functions for now.

My fold changes resulting with PuInc_seeker() are mostly Inf (and 1 NaN). Is that reasonable? If not do you know why this could be happening? The only issue in pre-processing I am trying to deal with is that some of the filled in values by xcsm seem to high to me.

Thanks for your help!

jcapelladesto commented 4 years ago

I think this is because there is a lot of 0 intensities in your data (as a result of filling peaks).
PuInc_seeker calculates fold-changes between 13C and 12C groups, therefore: Inf comes from dividing by x/0 NaN is the result of 0/0

You can remove the NaN index from the $PuInc if you want basepeak_finder to ignore it. Just overwrite it indexing with apply(1,$PuInc, function(x) any(is.na(x)) )