Closed gmhhope closed 3 years ago
Hi Minghao,
- Some of the chromatograms have shading and some don't have but with only lines...
correct, the shading indicates the actually detected chromatographic peaks. For your first feature in the pdf: I guess that the peak detection run did find a large peak in some samples and did split the peak (i.e. found two separate ones) for other ssamples (the red lines without shading). This could be a problem as the peak area in some samples is much larger than in others. Maybe some of the problems could be solved with refineChromPeaks
with MergeNeighboringPeaksParam
(see the vignette for some info on that).
- In terms of mz range and retention time range...
You're right. The "mzmin"
, "mzmax"
, "rtmin"
and "rtmax"
in the featureDefinitions
represents the min and max of the m/z and rts of the peak apex of each chromatographic peak of the feature. It is not the actual m/z and rt range of all chromatographic peaks of the feature. Thus, "rtmin"
and "rtmax
" in featureDefinitions
is the min and max of the values in column "rt"
of the chromPeaks
matrix (and the same for "mzmin"
and "mzmax"
) - it is not the min and max of the "rtmin"
and "rtmax"
columns in chromPeaks
. For the rtmin and rtmax, what you have in the featureDefinitions
is actually the min and max rt that of the points in the lower part of the chromPeakDensity
plots.I know, this is misleading, but I did not want to change this from the original xcms
implementation.
- I saw some samples have two peaks on a single sample
Hard to answer. For our data we always run the refineChromPeaks
and then sometimes I also use more "relaxed" settings in the correspondence by choosing a higher bw
and thus include neighboring chromatographic peaks (in the same sample) into the same feature. But this really depends on the samples/experiment. I do this mostly for larger data sets where I see that alignment and correspondence have a difficult time to separate isomers (typical candidates for us are fructose and glucose, or leucine and isoleucine). When I call then quantify
or featureValues
to extract the feature intensities I then also use the parameter method = "sum"
which will sum up the intensities of all chromatographic peaks in the same sample assigned to the same feature. I don't think you did anything wrong, but it's good to evaluate these plots and then you have to decide how to proceed, e.g. refine some of your settings (e.g. increase/decrease bw
or play with the settings of refineChromPeaks
).
- (optional question) Somehow I cannot get consistent direction of FC differences
I do not quite understand what you mean with consistent direction of the FC - but I guess that has to do with your specific data set. Note also that at this stage you still have raw abundances that you might or might not have to normalize (depending also on the experimental set up, how the samples were measured etc). A PCA could however always be helpful here to see if the samples generally group as expected. (also you might want to run fillChromPeaks
before the PCA...)
Thanks for the answer Johannes,
I have been very busy this week and I am reading carefully with your note. Thanks for the detailed explanation!
quantify
method and you suggested me to do method with 'sum'.
When I call then quantify or featureValues to extract the feature intensities I then also use the parameter method = "sum" which will sum up the intensities of all chromatographic peaks in the same sample assigned to the same feature.
Is the default method "trapezoidation" (the area under the peaks)? And I did observe some differences in certain sample within a particular feature. So if there are two peaks in one feature, how the default method behave comparing to "sum" method? Will it pick only one peak for the quantification by the Area under peak?
> quanitfy_wt_sum['D30_1_1.mzML']
D30_1_1.mzML
1065558
> quantify_wt_default['D30_1_1.mzML']
D30_1_1.mzML
299604
What will be a reasonable range of bw
? My previous setting is 1.8 and it looks like it is too small.
pdp <- PeakDensityParam(sampleGroups = xdata$sample_group,
minFraction = 0.8, bw = 1.8)
And I did run MergeNeighboringPeaksParam
. What would you like to change in my case? I would like to see how you think through the tuning process.
mpp <- MergeNeighboringPeaksParam(expandRt = 4, ppm = 10)
(following the 3rd question) This Sunday I also tried to look at those bizarre peaks and try to find good parameters for refineChromPeaks
:
mpp
with the minProp down to 0.05mpp <- MergeNeighboringPeaksParam(expandRt = 4, ppm = 10, minProp = 0.05)
and one of the plot confuse me. Again, I really appreciate on the explanation!
Thanks, Minghao Gong
I cannot help but pack up all the same plot from different samples. And that may help illustrate my question better:
And I assumed this may make quantification troublesome. For example, if doing "sum" of these peaks which probably corresponds to the same feature, there may have double values on the regions which overlapped? Is that so?
1) there is no additional signal integration/peak area calculation at the stage of featureValues
or quantify
. The parameter method
allows you to decide which of the peak intensities to return for features with multiple peaks in the same sample. With method = "sum"
you simply sum up the peak intensities of all peaks.
2) that's hart to tell. We're usually using values between 1.8
and 2
, but our peaks are generally only ~ 4 seconds wide. I guess in your case you could go up a little - but again, it's hard to tell. I would try different settings and then compare the results you get.
3) Also here, hard to tell. Best is to look what these settings do to your data. The settings should be OK, because in the end it only considers peaks for merging that are less than 4 seconds apart and have a difference in m/z smaller 10 ppm.
4) this is really quite disturbing. It looks like you have what I call an overlapping peak (that's why the grey is a little darker). You've successfully merged the first and second peak, and also it looks like the second and third were already merged before. But you have an additional peak in here. I would suggest to call chromPeaks
on your object (eventually after calling filterFile
to subset to the specific sample) with parameters rt = c(10, 50)
and mz = c(134.0794, 134.1179)
which will list you all chromatographic peaks in that region. Then please check the "mzmin"
and "mzmax"
columns. What I suspect is that there is an additional, maybe small, peak at a slightly different m/z than the 3 main peaks. The visualization with chromatogram
might be misleading here, because it creates a total ion chromatogram from all signal in the rt and m/z range you provided. The visualization of the chromatographic peaks in that region bases only on their rtmin and rtmax values which can be problematic if the m/z range is large and you might have several peaks with slightly different m/z values. In fact, your m/z range looks rather wide, so maybe you're having signal from different ions in that range? If you subsetted your XCMSnExp
to the one file you could also call plot(filterMz(filterRt(your_object, rt = c(10, 50))), mz = c(134.07, 134.12)), type = "XIC")
. This will show you the TIC for that region but also all individual data points in the rt-m/z plane.
And yes, summing up the signal is problematic if you have overlapping peaks. But since the featureValues
and quantify
don't re-integrate the signal but simply sum up the values of the "into"
column (in the chromPeaks
matrix) from all chromatographic peaks assigned to a feature within one sample it should be fine.
In your case I would maybe just use the refineChromPeaks
to avoid having overlapping chrom peaks (with the same m/z) and then tune the bw
parameter in the correspondence to group your 3 peaks into one feature.
Thanks Johannes for the extensive answers! They help a lot!
bw
test during CorrespondenceHere shows the bw
test and it looks like bw = 5
works for me at least in this case. Thank you!
I am repotting some of the figures you suggest. And it did show me that there are two m/z peaks in that region. Thanks for solving the most confusing part I have. Thank you for the great catch!
However, the interesting thing is that the m/z range I have on the plot mz = c(134.07, 134.12)
is the header of the output figure when I plot the feature chromatography by calling the following function.
feature_chroms <- featureChromatograms(xdata, features = "FTXXX")
plot(feature_chroms, col = sample_colors,
peakBg = sample_colors[chromPeaks(feature_chroms)[, "sample"]])
ppm = 10
across the whole script. So how come the mzmin
and mzmax
on the peaks corresponding to this feature become so large (the difference of 134.07-134.12 is > 200+ ppm ?) Agree, bw = 5
looks OK.
Hm, so, if I understood, you get this rather large m/z range by extracting the chromatogram for one feature. Two things:
What can you do: change the binSize
setting of PeakDensityParam
. The default is binSize = 0.25
thus, the algorithm will go through m/z ranges of 0.25 along the m/z dimension to group peaks within that range into the features. We usually use binSize = 0.02
or even lower to avoid grouping of peaks with different m/z into the same feature.
For the chrom peaks that are assigned to a feature: in the DataFrame
returned by featureDefinitions
is a column called "peakidx"
(or similar, don't remember by heart now) which is a list
with the indices of the chromatographic peaks of the feature in the chromPeaks
matrix.
The binSize
is something I ignored before. That should be a promising thing to tackle the problem. Thanks very much. I will update you with any progress!
Thanks, Minghao
Finally, it is the binSize
that saves my result. There are actually many parameters in need of modification, as the data were run in Qexactive instrument. So I finally use binSize = 0.005
, in the corresponding step which is most effective to remove those overlap peaks (but different mz range) in a feature. I also set ppm = 1
for CentWaveParam
, bw = 2
in corresponding step and quantify(... method = "sum")
to optimize our results in Qexactive instrument.
Thanks for all your help! You can close this issue now!
Best regards, Minghao
Thanks for the feedback @gmhhope
Hi Johannes,
I am trying to completely understand the resulted figures of
plotChromPeakDensity
to evaluate extracted features. Although I did try to go through issues and tutorial on the website, it looks like I still need your guidances on this. See the small report I generated.Attachment and codes
if(TRUE) { group_colors <- c("#99999980","#0000ff80", "#ff000080" ) names(group_colors) <- c("G1_Naive","G2_R5pos","G3_R5neg") sample_colors <- group_colors[xdata$sample_group] pdf(paste(output_dir,"incl_list","debugging_chromPeak_evaluation.pdf", sep = ""), width = 10, height = 10) par(mfrow = c(4, 4)) for (i in list_FT) { feature_chroms <- featureChromatograms(xdata, features = i) plotChromPeakDensity(feature_chroms, col = sample_colors, simulate = FALSE, # param = pdp, peakBg = sample_colors[chromPeaks(feature_chroms)[, "sample"]], peakCol = sample_colors[chromPeaks(feature_chroms)[, "sample"]], peakPch = 16, sub = incl_l[incl_l$X == i,"sub_title"])
plot(feature_chroms, col = sample_colors, xlab = "",
peakBg = sample_colors[chromPeaks(feature_chroms)[, "sample"]], sub = incl_l[incl_l$X == i,"sub_title"]) #as.character(i)
} dev.off() }