Data independent acquisition/SWATH support

jorainer commented 5 years ago

Enable analysis of data independent acquisition (including SWATH) data.

[x] peak detection in pockets/isolation windows.
[x] data structures for identified chromatographic peaks (issue #346).
[x] functionality to build MS2 spectra.
[x] vignette describing the new functionality.

concepts base on https://github.com/michaelwitting/metabolomics2018 from @michaelwitting

jorainer commented 5 years ago

Data structures for identified chrom peaks:

it is possible to add arbitrary annotations to each chromatographic peak using the chromPeakData DataFrame. Default columns in that DataFrame are ms_level and is_filled, to enable SWATH support we will have there also a column isolationWindow that identifies in which isolation window the peak was identified.

jorainer commented 5 years ago

Peak detection within isolation windows is possible with the findChromPeaksIsolationWindow function. The isolation window (i.e. definition which spectra belong to which isolation window) can be specified with the isolationWindow parameter.

jorainer commented 5 years ago

For each MS1 peak we have then to

identify fragment candidates (MS2 peaks) within the same rt window.
extract chromatogram for all
align chromatograms
correlate chromatograms
reconstruct MS2 spectrum from MS2 peaks with correlation > x

@michaelwitting, is that correct?

sneumann commented 5 years ago

Parts of that was done in a prototype using CAMERA, i.e. groupCorr() in a MS2 pocket gives a spectrum, and then we "only" need to find from which MS1 precursor that might originate. My prototype did not correlate the MS1 and MS2 chromatogram. Would it be interesting to calculate and attach a "TIC" chromatogram for all MS2 peaks in a collected MS2 spectrum, since it will be smoother than the individual ones ? Yours, Steffen

michaelwitting commented 5 years ago

@jorainer, yes correct so far. I started from the MS1 peak. Checked in which pocket it might fall and got all the MS2 peaks that where within a certain RT range around the MS1 peak, e.g. +/- 0.1 minutes around RT of MS1 peak. I have some prototype code here for the alignment and correlation. I will finish it and push it this evening.

sneumann commented 5 years ago

Side-note: there is public SWATH data as mzML in https://www.ebi.ac.uk/metabolights/MTBLS297 I could create a package mtbls297 similar to mtbls2, which could be used in a new vignette ? The vignette could live in mtbls297, saving us the hassle to have another few dozens of raw data in suggests for xcms. Yours, Steffen

michaelwitting commented 5 years ago

Sounds like a good idea. Since you know the people from this dataset quite well, there should be also no political problems ;-)

jorainer commented 5 years ago

Yes @sneumann ! That would be awesome! So far I am @michaelwitting 's toy data set and I was trying to talk him into adding that to the msdata package.

jorainer commented 5 years ago

Actually, it might still be helpfull to add one SWATH mzML file to msdata to have something for the unit tests...

michaelwitting commented 5 years ago

No problem. Just take my toy data set. We can have for the next bioconductor release.

sneumann commented 5 years ago

Get files from mtbls297 package:

library(Risa)
library(xcms)

ISAmtbls297 <- readISAtab(find.package("mtbls297"))
assay <- ISAmtbls297@assay.tabs[[1]]
msfiles <- paste(find.package("mtbls297"), "mzML",
                 assay@assay.file$"Derived Spectral Data File",
                 sep="/")

Works for above AB Sciex, adn Bruker mid-band CID so far. MS1 peak picking:

cwp <- CentWaveParam(ppm = 25, peakwidth = c(10, 20), snthresh = 10,
  prefilter = c(3, 100), mzCenterFun = "wMean", integrate = 1L,
  mzdiff = -0.001, fitgauss = FALSE, noise = 0, verboseColumns = FALSE,
  roiList = list(), firstBaselineCheck = TRUE, roiScales = numeric())

raw_data <- readMSData(msfiles, mode = "onDisk")

## Perform the peak detection using the settings defined above.
mtbls297 <- findChromPeaks(raw_data, param = cwp, BPPARAM = MulticoreParam())

Now get the SWATH data:

x2 <- findChromPeaksIsolationWindow(mtbls297, 
                                    param = cwp, 
                                    BPPARAM = MulticoreParam())
cpd <- chromPeakData(x2)

Although no data yet:

> head(cpd)
DataFrame with 6 rows and 6 columns
        ms_level is_filled isolationWindow isolationWindowTargetMZ
       <integer> <logical>        <factor>               <numeric>
CP0001         1     FALSE              NA                      NA
CP0002         1     FALSE              NA                      NA
CP0003         1     FALSE              NA                      NA
CP0004         1     FALSE              NA                      NA
CP0005         1     FALSE              NA                      NA
CP0006         1     FALSE              NA                      NA

jorainer commented 5 years ago

@sneumann, can you share the mtbls297 package somehow?

jorainer commented 5 years ago

The code to align Chromatogram objects and to correlate them will be implemented in methods #379 and #380 - these might eventually then go to MSnbase.

michaelwitting commented 5 years ago

One thought that came to my mind is that we reconstruct the spectra for each ChromPeak, that would mean also for isotopes, adducts etc. Do we want this behavior? Could be somehow also used at a later stage, e.g. different isotopes should have the same reconstructed MS2 spectrum.

jorainer commented 5 years ago

agree - but I would do this in a second step. IMHO it would be easier (and safer) to define the MS2 spectrum for each chromatographic peak (in each file) separately (without taking any other information into account) and then do the refinements later (e.g. with combineSpectra to define the common MS2 spectrum for isotopes).

If we see that this will not work or if we see improvements we can then later implement more sophisticated approaches.

jorainer commented 5 years ago

or implement an additional reconstructFeatureSpectra that does take correlation of the peaks across samples into account.

michaelwitting commented 5 years ago

Okay... Let's do it that way. I'm sorry, I'm a bit behind with everything. Mostly with my habilitation, which is due in 4 months. So I have to hurry up a bit...

jorainer commented 5 years ago

If you are too busy @michaelwitting I can implement the function to reconstruct the MS2 spectrum (#377) and let you have a look at it if it makes sense. Once we have that function we can think how to improve (e.g. include correlation across samples or similar as discussed in #377.

jorainer commented 5 years ago

BTW, if not mistaken you said you were working on the swath vignette @michaelwitting - if so, can you push or make a pull request?

michaelwitting commented 5 years ago

Yes. Will come soon. Problems with R, everything was lost. You will get a push soon.

michaelwitting commented 5 years ago

Added a few things to the vignette and splitted also out the tomato part (vignette on its own). Once we have the reconstruction function I can finish the part on the example substance.

jorainer commented 5 years ago

For the things that are still missing - I'm also quite busy at present, but we could discuss and implement the things at latest in Den Haag.

michaelwitting commented 5 years ago

Let's do in The Hague. I'm also out on another conference from saturday. See you there!

sneumann / xcms

Data independent acquisition/SWATH support #375