Open jorainer opened 5 years ago
Data structures for identified chrom peaks:
chromPeakData
DataFrame
. Default columns in that DataFrame
are ms_level
and is_filled
, to enable SWATH support we will have there also a column isolationWindow
that identifies in which isolation window the peak was identified.Peak detection within isolation windows is possible with the findChromPeaksIsolationWindow
function. The isolation window (i.e. definition which spectra belong to which isolation window) can be specified with the isolationWindow
parameter.
For each MS1 peak we have then to
@michaelwitting, is that correct?
Parts of that was done in a prototype using CAMERA, i.e. groupCorr() in a MS2 pocket gives a spectrum, and then we "only" need to find from which MS1 precursor that might originate. My prototype did not correlate the MS1 and MS2 chromatogram. Would it be interesting to calculate and attach a "TIC" chromatogram for all MS2 peaks in a collected MS2 spectrum, since it will be smoother than the individual ones ? Yours, Steffen
@jorainer, yes correct so far. I started from the MS1 peak. Checked in which pocket it might fall and got all the MS2 peaks that where within a certain RT range around the MS1 peak, e.g. +/- 0.1 minutes around RT of MS1 peak. I have some prototype code here for the alignment and correlation. I will finish it and push it this evening.
Side-note: there is public SWATH data as mzML in
https://www.ebi.ac.uk/metabolights/MTBLS297
I could create a package mtbls297 similar to mtbls2,
which could be used in a new vignette ? The vignette could live
in mtbls297, saving us the hassle to have another few dozens of
raw data in suggests
for xcms. Yours, Steffen
Sounds like a good idea. Since you know the people from this dataset quite well, there should be also no political problems ;-)
Yes @sneumann ! That would be awesome! So far I am @michaelwitting 's toy data set and I was trying to talk him into adding that to the msdata
package.
Actually, it might still be helpfull to add one SWATH mzML file to msdata
to have something for the unit tests...
No problem. Just take my toy data set. We can have for the next bioconductor release.
Get files from mtbls297 package:
library(Risa)
library(xcms)
ISAmtbls297 <- readISAtab(find.package("mtbls297"))
assay <- ISAmtbls297@assay.tabs[[1]]
msfiles <- paste(find.package("mtbls297"), "mzML",
assay@assay.file$"Derived Spectral Data File",
sep="/")
Works for above AB Sciex, adn Bruker mid-band CID so far. MS1 peak picking:
cwp <- CentWaveParam(ppm = 25, peakwidth = c(10, 20), snthresh = 10,
prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L,
mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE,
roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric())
raw_data <- readMSData(msfiles, mode = "onDisk")
## Perform the peak detection using the settings defined above.
mtbls297 <- findChromPeaks(raw_data, param = cwp, BPPARAM = MulticoreParam())
Now get the SWATH data:
x2 <- findChromPeaksIsolationWindow(mtbls297,
param = cwp,
BPPARAM = MulticoreParam())
cpd <- chromPeakData(x2)
Although no data yet:
> head(cpd)
DataFrame with 6 rows and 6 columns
ms_level is_filled isolationWindow isolationWindowTargetMZ
<integer> <logical> <factor> <numeric>
CP0001 1 FALSE NA NA
CP0002 1 FALSE NA NA
CP0003 1 FALSE NA NA
CP0004 1 FALSE NA NA
CP0005 1 FALSE NA NA
CP0006 1 FALSE NA NA
@sneumann, can you share the mtbls297
package somehow?
The code to align Chromatogram
objects and to correlate them will be implemented in methods #379 and #380 - these might eventually then go to MSnbase
.
One thought that came to my mind is that we reconstruct the spectra for each ChromPeak
, that would mean also for isotopes, adducts etc.
Do we want this behavior? Could be somehow also used at a later stage, e.g. different isotopes should have the same reconstructed MS2 spectrum.
agree - but I would do this in a second step. IMHO it would be easier (and safer) to define the MS2 spectrum for each chromatographic peak (in each file) separately (without taking any other information into account) and then do the refinements later (e.g. with combineSpectra
to define the common MS2 spectrum for isotopes).
If we see that this will not work or if we see improvements we can then later implement more sophisticated approaches.
or implement an additional reconstructFeatureSpectra
that does take correlation of the peaks across samples into account.
Okay... Let's do it that way. I'm sorry, I'm a bit behind with everything. Mostly with my habilitation, which is due in 4 months. So I have to hurry up a bit...
If you are too busy @michaelwitting I can implement the function to reconstruct the MS2 spectrum (#377) and let you have a look at it if it makes sense. Once we have that function we can think how to improve (e.g. include correlation across samples or similar as discussed in #377.
BTW, if not mistaken you said you were working on the swath vignette @michaelwitting - if so, can you push or make a pull request?
Yes. Will come soon. Problems with R, everything was lost. You will get a push soon.
Added a few things to the vignette and splitted also out the tomato part (vignette on its own). Once we have the reconstruction function I can finish the part on the example substance.
For the things that are still missing - I'm also quite busy at present, but we could discuss and implement the things at latest in Den Haag.
Let's do in The Hague. I'm also out on another conference from saturday. See you there!
Enable analysis of data independent acquisition (including SWATH) data.
concepts base on https://github.com/michaelwitting/metabolomics2018 from @michaelwitting