sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
180 stars 80 forks source link

function to get the areas of a list of mz-rt pairs #525

Closed mar-garcia closed 3 years ago

mar-garcia commented 3 years ago

Hi!

I was wondering if there is (or if there would be... :) any function to integrate the areas of a specific list of mz-rt pairs. I mean, I'm interested in tailoring the peak detection for specific compounds I know are present in the samples, instead of doing the peak detection of all signals in the samples.

Thanks!!! Mar

stanstrup commented 3 years ago

I see 3 options: 1) You could create chromatograms for each of those pairs and do the peak-picking on the chromatogram objects. 2) Make subset objects for all files/interval pairs and do peak picking on those and then filter 3) Supply roiList to the peak picking object

Practically you probably won't gain much in terms of speed by 1 and 2 compared to just doing full peak-picking and filtering afterwards. I am not sure how precise the ROIs need to be for the peak picking to play nice but that might be a bit faster. In general I think it is correct to say that RT subsetting makes things faster but m/z subsetting does not really as it has to read the whole scan anyway. 1 or 2 might be faster with an in memory object but I don't know.

On Mon, Nov 23, 2020 at 9:33 AM Mar Garcia-Aloy notifications@github.com wrote:

Hi!

I was wondering if there is (or if there would be... :) any function to integrate the areas of a specific list of mz-rt pairs. I mean, I'm interested in tailoring the peak detection for specific compounds I know are present in the samples, instead of doing the peak detection of all signals in the samples.

Thanks!!! Mar

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/sneumann/xcms/issues/525, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCSTSU72I4FIRI3DNFDGVTSRIM4XANCNFSM4T7FMBBQ .

jorainer commented 3 years ago

Thanks @stanstrup for the feedback! That's definitely an option. I would in addition suggest the following which would allow to integrate/fix peak detections manually:

So, this is in fact different to the ROI solution, as with that one centWave would be run again on the provided ROI - this solution is to add peaks with the manually specified peak boundaries.

Does this sound like an acceptable solution?

mar-garcia commented 3 years ago

Thanks a lot @stanstrup and @jorainer !!!!

For the moment I have done something similar to the @stanstrup suggestion (if I understood well): (1) I do a list of the mz-rt pairs; then (2) I go sample by sample and compound by compound and I use the filterRt() function to filter the file according to the RT of the compound of interest; after (3) I do the peak detection of this "new object"; and then (4) I look for the peak with my mz of interest. Maybe the suggestion of @jorainer could allow to gain speed in the processing time... However, from what I understood, first the peak detection would have to be done according to the "general" workflow, that is, for all the signals? Because my idea was to go directly for the mz-rt pairs, since in this way I suppose that the required time would be lower because the algorithm would only have to search the signals of interest (and not all the signals each sample contains).... Not sure if I'm clear....

jorainer commented 3 years ago

We could make the manualChromPeaks independent of the findChromPeaks, so that you could only do the manual peak integration on the data.

The difference to the findChromPeaks with specifying ROIlist or the subsetting and subsequent call of findChromPeaks would be that the manualChromPeaks does not do any peak detection with e.g. centWave but simply does the peak integration/quantification based on the provided m/z - rt boundaries. Is this what would work for you?

mar-garcia commented 3 years ago

If I understood well, yes @jorainer. At the end, what I would like to do is to avoid doing manual integration of peaks of interest (which usually we can do with the vendor's softwares) and try to do it in an automatic (and reproducible) way with an open algoritm. I'm interested in this since, for example, some times the general parameters do not work perfectly well for all peaks. Therefore, when I'm interested in a specific mz-rt pair (for example, an internal standard, a potential biomarker, etc.) I would like to be sure that the peak areas I'm working with are integrated in the "best way" and in this way avoid potential bias related with the data processing.

jorainer commented 3 years ago

You can try the new function after installing the current devel version (BiocManager::install("sneumann/xcms")).

Just have a look at ?manualChromPeaks. All you need is a matrix with mzmin, mzmax, rtmin and rtmax and the function will add these manual peaks to the chromPeaks matrix and do the integration.

mar-garcia commented 3 years ago

Seems it's working as I needed! :) Hoping it will be useful for other users too!! Thanks!!!

jorainer commented 3 years ago

Good that it's working! I close the issue - feel free to re-open if needed

stanstrup commented 2 years ago

Would it be possible to get fitgauss in this method?

EDIT: ah I see that this doesn't actually detect the peak.

gmhhope commented 2 years ago

@jorainer I greatly appreciate this function which help perform some targeted peak picking and feature extraction. It is of great help!

However, after getting the peak table and feature table. I cannot easily trace back to the peaks I am looking after, though I can do my own m/z search with certain tolerance within the peak table or feature table. If somehow those peaks can be indexed with my original targeted m/z, rt (and even a deltaM/z), it will be great!

Thanks, Minghao Gong

jorainer commented 2 years ago

@gmhhope , if I understand, you would like to have some additional annotation in the chromPeaksData DataFrame for these manually integrated peaks? Would it be sufficient to have a column "manual" with TRUE/FALSE?

pablovgd commented 2 years ago

Dear @jorainer I am actually looking for something with the same functionality as described by @gmhhope . I Have a list of compounds that were found in targeted analysis using vendor software, defined by RT and m/z. Now I'm looking for a quick way to find these compounds in the resulting feature table to check if XCMS has found these compounds as well. I've looked around on the Github but didn't find just quite what I would need. Any suggestions? Thanks a lot!

Kind regards Pablo

jorainer commented 2 years ago

Hi Pablo,

this would be possible with the MetaboAnnotation package (which is now also available in Bioconductor, so, given you have R 4.2 installed, you can install it with BiocManager::install("MetaboAnnotation").

Basically, what you would do is a) read your vendor software information (RT and m/z) into R as a data.frame. Then you would use the matchValues function to match the m/z and retention time values from your data.frame against those of features found by xcms. The code could look like:

prm <- MzRtParam(ppm = 10, toleranceRt = 5)
mtch <- matchValues(vendor_df, featureDefinitions(xdata), param = prm,
    mzColname = c("mz", "mzmed"), rtColname = c("RT", "rtmed"))

vendor_df is assumed to be the data.frame with the information from your vendor software with columns "mz" and "RT" containing the m/z and retention time values. xdata is the result object from xcms. Obviously you might want to adapt the ppm, tolerance and toleranceRt parameters to your setting.

The mtch object would then contain the results from this matching. Maybe have a look at the vignette of the MetaboAnnotation package for some first information and eventually come back if something is not working or is unclear. Then it would however be better to open an issue directly in MetaboAnnotation.

pablovgd commented 2 years ago

Hi Johannes!

Thanks again for the quick reply! I'll take a look at the vignette of MetaboAnnotation.

Greetings Pablo

AnupamGautam commented 2 years ago

Dear @jorainer

Sorry if I am asking a question under a different topic, but I saw your above code regarding MetaboAnnotation and found it matches my case.

I am have analyzed LC-MS/MS data for 29 samples (belonging to four groups), after processing samples, I have generated feature table by merging table from featureDefinitions() and featureValues() and finally filtering based on c("peakidx"). I got column header as "Row.names,"mzmed","mzmin","mzmax","rtmed","rtmin","rtmax","npeaks","G1_E2","G1_E3","G2_E2","G2_E3","ms_level","sample1,"so on....".

Q) If I want to annotate the results based on the mz value, can I use the mzmed column as a query directly for HMDB, Metlin, or, MetaboAnnotation( as you showed in your above code,) or do I need to do any further processing to get mz value? I am a bit confused, are mzmed (mzmed=mz) the final values, we should use for compound annotation?

Regards, Anupam

jorainer commented 2 years ago

Dear Anupam,

the column "mzmed" is the represents the median m/z value of all chromatographic peaks (in all samples) assigned to that feature. So, yes, you can consider the "mzmed" as the m/z of your feature and use that for annotation.

AnupamGautam commented 2 years ago

Dear Johannes,

Thank you for the clarification.

Regards, Anupam