sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
185 stars 80 forks source link

Suggestion about do_findChromPeaks() on MRM data #277

Closed rromoli closed 6 years ago

rromoli commented 6 years ago

Hi, I would use the do_findChromPeaks() with the matchedFilter algorithm to detect and integrate some mrm data. MRM data are matrix with two columns: retention time and intensity. I would use the matchedFilter() function to correctly individuate and integrate the signal. I try to import the data into xcms via the readMSData but it fails (Error in object@backend$getAllScanHeaderInfo() : upper value must be greater than lower value). So I would used the do_findChromPeaks_matchedFilter() manualy but I get some trouble to set the valsPerSpect parameter. What kind of data should I use to set this parameter correctly?

All the best

Riccardo

jorainer commented 6 years ago

Great to hear that you're trying to use the do_** function! Now, first, you can read MRM/SRM data with the readSRMData from the MSnbase package. That returns a Chromatograms object.

now, using the do_findChromPeaks* functions for data that is not organized in spectra is tricky. See also #169. @wilsontom actually proposes in that issue a solution for converting MRM data to LCMS style data. Still, if I find time I might extract the code to perform the peak detection as a function and export that. Peak detection on MRM/SRM data is something I always wanted to do.

rromoli commented 6 years ago

II'm agree the solution proposed in #169 is a little bit tricky. I would also develop a method to detect peaks in mrm space using the do_** functions. I think that the approach using the matchedFilter method is good because mrm data are quite simple to integrate! I try to work with it but do not understand what valsPerSpect stand for. Can you explain it to me?

jorainer commented 6 years ago

The int and rt values are supposed to be numeric vectors with the intensities respectively retention times of all spectra (i.e. first x values are from the first spectrum, next y from the second and so on). The valsPerSpect is an integer same length as number of spectra in the mzML file telling the function how many values in int belong to which spectrum (i.e. will be c(x, y, ...) from the example above).

Now, looking at the code I don't think it makes much sense to hack something to use the do_findChromPeaks* functions - but good news is I did already start writing a peaksWithMatchedFilter function that works on purely chromatographic data (you'll need just intensities and retention time). I hope to be done with in the next couple of days - need to finish implementing it and perform some tests etc.

The plan is then to have at some point a findChromPeaks,Chromatogram,... method to perform the peak detection on Chromatograms objects (ultimately also using centWave).

jorainer commented 6 years ago

You can now use the peaksWithMatchedFilter function if you install the most recent xcms from github (install.github("sneumann/xcms", ref = "master")). This requires R-3.5.0 and BioC 3.7.

rromoli commented 6 years ago

Thanks!!! It is a very great news!!! I try it very fastly and it seems to run well even if I get an error reading the raw data (od <- readMSData(files[53], mode = "onDisk") Error in object@backend$getAllScanHeaderInfo() : upper value must be greater than lower value ). I used readSRMData() from MSnbase and it works.

jorainer commented 6 years ago

Yes, readMSData can not read SRM/MRM files since they contain no spectrum data (we should show a better error message though). readSRMData should work.

Good to hear that it seems to work. Feel free to close the issue if you're happy.

rromoli commented 6 years ago

The only thing that I do not understand is the result of the s/n. I used the function to integrate a very high peak but I get a low s/n ratio (~7). I read the code and, if I clearly understand, you estimate the noise such as : noise <- mean(int[int > 0]). I think this overestimate the noise so I try to calculate the noise from the scan at the minimum value of the matchedFilter up to the last scan:

min_idx <- which.min(int_filt); noise <- mean(int[min_idx:length(int)])

In this way the s/n ratio is greater and from the initial value of 7 now is 55. It seems to be a more realistic value for my peak.

What do you think about?

jorainer commented 6 years ago

peaksWithMatchedFilter uses the same s/n calculation that is performed also in the original matchedFilter method. The noise is clearly overestimated - I think the initial idea was that in one slice of the MS data (i.e. for one m/z bin along the full rt range) there is mostly noise with only few peaks. Something that might not be correct for SRM/MRM data... I think your approach is better in estimating the s/n for your peak.

rromoli commented 6 years ago

Do you think this change can be implemented into the function?

jorainer commented 6 years ago

I'd like to keep the original code - for consistency reasons - because if you used the function on LCMS data you would get the same results.

jorainer commented 6 years ago

I am also working currently on the peaksWithCentWave function (issue #279). I expect that to perform better detecting peaks with different widths etc (and also estimate the s/n in a better way).

rromoli commented 6 years ago

That's great! I'm workin on a functions to collect all the mrm experiment from several files and build a unique data.frame with all the result such in the xcms style. A sort of featureDefinitions() - featureValues() for mrm data.

jorainer commented 6 years ago

Nice! I was also thinking of something like that - ideally I would love to have an object (e.g. ChromPeaks) that extends the Chromatograms class from MSnbase. That would then be analogous to the xcms::XCMSnExp extending the MSnbase::OnDiskMSnExp.

Feel free to make pull requests against xcms if you like.