sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
183 stars 80 forks source link

Request explanation on plot generated by plotChromPeakDensity #557

Closed gmhhope closed 3 years ago

gmhhope commented 3 years ago

Hi Johannes,

I am trying to completely understand the resulted figures of plotChromPeakDensity to evaluate extracted features. Although I did try to go through issues and tutorial on the website, it looks like I still need your guidances on this. See the small report I generated.

Attachment and codes

if(TRUE) { group_colors <- c("#99999980","#0000ff80", "#ff000080" ) names(group_colors) <- c("G1_Naive","G2_R5pos","G3_R5neg") sample_colors <- group_colors[xdata$sample_group] pdf(paste(output_dir,"incl_list","debugging_chromPeak_evaluation.pdf", sep = ""), width = 10, height = 10) par(mfrow = c(4, 4)) for (i in list_FT) { feature_chroms <- featureChromatograms(xdata, features = i) plotChromPeakDensity(feature_chroms, col = sample_colors, simulate = FALSE, # param = pdp, peakBg = sample_colors[chromPeaks(feature_chroms)[, "sample"]], peakCol = sample_colors[chromPeaks(feature_chroms)[, "sample"]], peakPch = 16, sub = incl_l[incl_l$X == i,"sub_title"])

plot(feature_chroms, col = sample_colors, xlab = "",

peakBg = sample_colors[chromPeaks(feature_chroms)[, "sample"]], sub = incl_l[incl_l$X == i,"sub_title"]) #as.character(i)

} dev.off() }



#### Questions
- Some of the chromatograms have shading and some don't have but with only lines. Doe each piece of shading indicate a detected peak area? What does it mean for those who don't have shadings and have only line?
- In terms of mz range and retention time range, it looks like they are not directly from the mzmin - mzmax, which can be see in the `featureDefinitions.csv`; and the retention time is also not from rtmin - rtmax. How the range was generated?
- I saw some samples have two peaks on a single sample (two or three dots in the same row); Are they supposed to merge during `refineChromPeaks`? Does it mean something get wrong in certain steps?
- (optional question) Somehow I cannot get consistent direction of FC differences (calculated in the feature table level) across each feature by eyeballing the differences between sample_groups ( "blue" group/ "red" group). I am still trying to locate the issue. It may be beyond your scope. But if you give me any suggestion for this question, that will be great!

Thanks very much for your tremendous supports on XCMS! I am looking forward to any of your feedbacks!

Best regards,
Minghao Gong
jorainer commented 3 years ago

Hi Minghao,

  • Some of the chromatograms have shading and some don't have but with only lines...

correct, the shading indicates the actually detected chromatographic peaks. For your first feature in the pdf: I guess that the peak detection run did find a large peak in some samples and did split the peak (i.e. found two separate ones) for other ssamples (the red lines without shading). This could be a problem as the peak area in some samples is much larger than in others. Maybe some of the problems could be solved with refineChromPeaks with MergeNeighboringPeaksParam (see the vignette for some info on that).

  • In terms of mz range and retention time range...

You're right. The "mzmin", "mzmax", "rtmin" and "rtmax" in the featureDefinitions represents the min and max of the m/z and rts of the peak apex of each chromatographic peak of the feature. It is not the actual m/z and rt range of all chromatographic peaks of the feature. Thus, "rtmin" and "rtmax" in featureDefinitions is the min and max of the values in column "rt" of the chromPeaks matrix (and the same for "mzmin" and "mzmax") - it is not the min and max of the "rtmin" and "rtmax" columns in chromPeaks. For the rtmin and rtmax, what you have in the featureDefinitions is actually the min and max rt that of the points in the lower part of the chromPeakDensity plots.I know, this is misleading, but I did not want to change this from the original xcms implementation.

  • I saw some samples have two peaks on a single sample

Hard to answer. For our data we always run the refineChromPeaks and then sometimes I also use more "relaxed" settings in the correspondence by choosing a higher bw and thus include neighboring chromatographic peaks (in the same sample) into the same feature. But this really depends on the samples/experiment. I do this mostly for larger data sets where I see that alignment and correspondence have a difficult time to separate isomers (typical candidates for us are fructose and glucose, or leucine and isoleucine). When I call then quantify or featureValues to extract the feature intensities I then also use the parameter method = "sum" which will sum up the intensities of all chromatographic peaks in the same sample assigned to the same feature. I don't think you did anything wrong, but it's good to evaluate these plots and then you have to decide how to proceed, e.g. refine some of your settings (e.g. increase/decrease bw or play with the settings of refineChromPeaks).

  • (optional question) Somehow I cannot get consistent direction of FC differences

I do not quite understand what you mean with consistent direction of the FC - but I guess that has to do with your specific data set. Note also that at this stage you still have raw abundances that you might or might not have to normalize (depending also on the experimental set up, how the samples were measured etc). A PCA could however always be helpful here to see if the samples generally group as expected. (also you might want to run fillChromPeaks before the PCA...)

gmhhope commented 3 years ago

Thanks for the answer Johannes,

I have been very busy this week and I am reading carefully with your note. Thanks for the detailed explanation!

  1. I saw you mentioned the quantify method and you suggested me to do method with 'sum'.

    When I call then quantify or featureValues to extract the feature intensities I then also use the parameter method = "sum" which will sum up the intensities of all chromatographic peaks in the same sample assigned to the same feature.

Is the default method "trapezoidation" (the area under the peaks)? And I did observe some differences in certain sample within a particular feature. So if there are two peaks in one feature, how the default method behave comparing to "sum" method? Will it pick only one peak for the quantification by the Area under peak?

> quanitfy_wt_sum['D30_1_1.mzML']
D30_1_1.mzML 
     1065558 
> quantify_wt_default['D30_1_1.mzML']
D30_1_1.mzML 
      299604 
  1. What will be a reasonable range of bw? My previous setting is 1.8 and it looks like it is too small.

    pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
                        minFraction = 0.8, bw = 1.8)
  2. And I did run MergeNeighboringPeaksParam. What would you like to change in my case? I would like to see how you think through the tuning process.

    mpp <- MergeNeighboringPeaksParam(expandRt = 4, ppm = 10)
  3. (following the 3rd question) This Sunday I also tried to look at those bizarre peaks and try to find good parameters for refineChromPeaks:

    • I tried to play around the mpp with the minProp down to 0.05
    • mpp <- MergeNeighboringPeaksParam(expandRt = 4, ppm = 10, minProp = 0.05) and one of the plot confuse me.
    • peak_merged_outcome_wth_minprop0 05
    • The minProp = 0.05 did merge the 1st and 2nd peak (ordered from right to left; and it made sense!); Yet it surprsingly doesn't merge the 2nd and 3rd peak. And I get confused by why (1) they are not merged; (2) the pick peaking define the peak somehow intruding another one (Maybe it is because of the wavelet algorithm? ).
    • And this made the quantification annoying as you may see below (some samples consider both 2nd and 3rd as one and some samples don't):
    • quantification_correspondence_result
    • Another single sample to make things even more difficult to understand

Again, I really appreciate on the explanation!

Thanks, Minghao Gong

gmhhope commented 3 years ago

I cannot help but pack up all the same plot from different samples. And that may help illustrate my question better:

peak_merging_issue

And I assumed this may make quantification troublesome. For example, if doing "sum" of these peaks which probably corresponds to the same feature, there may have double values on the regions which overlapped? Is that so?

jorainer commented 3 years ago

1) there is no additional signal integration/peak area calculation at the stage of featureValues or quantify. The parameter method allows you to decide which of the peak intensities to return for features with multiple peaks in the same sample. With method = "sum" you simply sum up the peak intensities of all peaks.

2) that's hart to tell. We're usually using values between 1.8 and 2, but our peaks are generally only ~ 4 seconds wide. I guess in your case you could go up a little - but again, it's hard to tell. I would try different settings and then compare the results you get.

3) Also here, hard to tell. Best is to look what these settings do to your data. The settings should be OK, because in the end it only considers peaks for merging that are less than 4 seconds apart and have a difference in m/z smaller 10 ppm.

4) this is really quite disturbing. It looks like you have what I call an overlapping peak (that's why the grey is a little darker). You've successfully merged the first and second peak, and also it looks like the second and third were already merged before. But you have an additional peak in here. I would suggest to call chromPeaks on your object (eventually after calling filterFile to subset to the specific sample) with parameters rt = c(10, 50) and mz = c(134.0794, 134.1179) which will list you all chromatographic peaks in that region. Then please check the "mzmin" and "mzmax" columns. What I suspect is that there is an additional, maybe small, peak at a slightly different m/z than the 3 main peaks. The visualization with chromatogram might be misleading here, because it creates a total ion chromatogram from all signal in the rt and m/z range you provided. The visualization of the chromatographic peaks in that region bases only on their rtmin and rtmax values which can be problematic if the m/z range is large and you might have several peaks with slightly different m/z values. In fact, your m/z range looks rather wide, so maybe you're having signal from different ions in that range? If you subsetted your XCMSnExp to the one file you could also call plot(filterMz(filterRt(your_object, rt = c(10, 50))), mz = c(134.07, 134.12)), type = "XIC"). This will show you the TIC for that region but also all individual data points in the rt-m/z plane.

And yes, summing up the signal is problematic if you have overlapping peaks. But since the featureValues and quantify don't re-integrate the signal but simply sum up the values of the "into" column (in the chromPeaks matrix) from all chromatographic peaks assigned to a feature within one sample it should be fine.

In your case I would maybe just use the refineChromPeaks to avoid having overlapping chrom peaks (with the same m/z) and then tune the bw parameter in the correspondence to group your 3 peaks into one feature.

gmhhope commented 3 years ago

Appreciation

Thanks Johannes for the extensive answers! They help a lot!

The bw test during Correspondence

Here shows the bw test and it looks like bw = 5 works for me at least in this case. Thank you!

Screen Shot 2021-04-12 at 8 49 13 AM

Peak overlap of two m/z ion confirmed but question related to the mz range header of the output of `plot(feature_chroms(xdata, features = "FTXXX"))

jorainer commented 3 years ago

Agree, bw = 5 looks OK.

Hm, so, if I understood, you get this rather large m/z range by extracting the chromatogram for one feature. Two things:

What can you do: change the binSize setting of PeakDensityParam. The default is binSize = 0.25 thus, the algorithm will go through m/z ranges of 0.25 along the m/z dimension to group peaks within that range into the features. We usually use binSize = 0.02 or even lower to avoid grouping of peaks with different m/z into the same feature.

For the chrom peaks that are assigned to a feature: in the DataFrame returned by featureDefinitions is a column called "peakidx" (or similar, don't remember by heart now) which is a list with the indices of the chromatographic peaks of the feature in the chromPeaks matrix.

gmhhope commented 3 years ago

The binSize is something I ignored before. That should be a promising thing to tackle the problem. Thanks very much. I will update you with any progress!

Thanks, Minghao

gmhhope commented 3 years ago

Finally, it is the binSize that saves my result. There are actually many parameters in need of modification, as the data were run in Qexactive instrument. So I finally use binSize = 0.005, in the corresponding step which is most effective to remove those overlap peaks (but different mz range) in a feature. I also set ppm = 1 for CentWaveParam, bw = 2 in corresponding step and quantify(... method = "sum") to optimize our results in Qexactive instrument.

Thanks for all your help! You can close this issue now!

Best regards, Minghao

jorainer commented 3 years ago

Thanks for the feedback @gmhhope