sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
177 stars 81 forks source link

featureSpectra does not find MS2 spectra #610

Closed eterlova closed 2 years ago

eterlova commented 2 years ago

Hello all,

I ran into trouble trying to extract MS2 spectra using featureSpectra function. The function runs but finds zero spectra, even though I can see them in the dataset. What could be wrong do you think?

> xdata <- findChromPeaks(rawdata_cent, param=cwp)
> xdata <- findChromPeaks(xdata, param = cwp, msLevel = 2L, add = TRUE)
> xdata_pp <- refineChromPeaks(xdata, MergeNeighboringPeaksParam(expandRt = 2))
> xdata_rtaligned <- adjustRtime(xdata_pp, param = ObiwarpParam(binSize = 0.4))
> xdata_correspondence <- groupChromPeaks(xdata_rtaligned, param = PeakDensityParam(sampleGroups = xdata_rtaligned$sample_group, minFraction = 0.4, bw = 30))
> xdata_filled <- fillChromPeaks(xdata_correspondence, param = ChromPeakAreaParam())

> filteredMs2Spectra <- featureSpectra(xdata_filled, return.type = "MSpectra")
> filteredMs2Spectra
MSpectra with 0 spectra and 2 metadata column(s):

I thought that was suspicious, so I run a couple "tests":

> table(msLevel(xdata_filled))
    1     2 
32185 30804 
> table(msLevel(rawdata_cent))
    1     2 
32185 30804 
> head(chromPeaks(xdata_filled, msLevel = 2L))
               mz    mzmin    mzmax       rt    rtmin    rtmax      into
CP000011 154.9633 154.9583 154.9653 60.30496 56.08234 63.20417 2448.0084
CP000021 270.9460 270.9428 270.9502 59.88330 56.08234 63.20417 1230.0382
CP000031 427.9043 427.8996 427.9111 60.30496 56.50516 65.30382  730.3914
CP000041 226.9891 226.9857 226.9929 60.72538 56.08234 63.63152 1771.7402
CP000051 382.9287 382.9251 382.9304 60.30496 56.50516 63.63152 2069.3752
CP000061 383.9299 383.9281 383.9379 60.72538 56.08234 63.63152 1094.1621
              intb     maxo  sn sample
CP000011 2441.2190 670.5108 198      1
CP000021 1223.2488 278.3117 207      1
CP000031  723.6684 214.0649 213      1
CP000041 1764.5129 439.7749 152      1
CP000051 2062.5774 512.4502 511      1
CP000061 1086.9348 273.5758 273      1

So I think my MS2 spectra are there, but why featureSpectra does not find them? Session info:

> sessionInfo()
R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/FCAM/eterlova/miniconda3/envs/Rmetab/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] Spectra_1.5.12              data.table_1.14.2          
 [3] pander_0.6.4                RColorBrewer_1.1-2         
 [5] xcms_3.16.1                 MSnbase_2.20.4             
 [7] ProtGenerics_1.26.0         mzR_2.28.0                 
 [9] Rcpp_1.0.8                  BiocParallel_1.28.3        
[11] SummarizedExperiment_1.24.0 Biobase_2.54.0             
[13] GenomicRanges_1.46.1        GenomeInfoDb_1.30.1        
[15] IRanges_2.28.0              S4Vectors_0.32.3           
[17] BiocGenerics_0.40.0         MatrixGenerics_1.6.0       
[19] matrixStats_0.61.0          magrittr_2.0.2             

loaded via a namespace (and not attached):
 [1] vsn_3.62.0             foreach_1.5.2          assertthat_0.2.1      
 [4] BiocManager_1.30.16    affy_1.72.0            GenomeInfoDbData_1.2.7
 [7] robustbase_0.93-9      impute_1.68.0          pillar_1.7.0          
[10] lattice_0.20-45        glue_1.6.2             limma_3.50.1          
[13] digest_0.6.29          XVector_0.34.0         colorspace_2.0-3      
[16] preprocessCore_1.56.0  Matrix_1.4-0           plyr_1.8.6            
[19] MALDIquant_1.21        XML_3.99-0.9           pkgconfig_2.0.3       
[22] zlibbioc_1.40.0        purrr_0.3.4            scales_1.1.1          
[25] RANN_2.6.1             affyio_1.64.0          tibble_3.1.6          
[28] generics_0.1.2         ggplot2_3.3.5          ellipsis_0.3.2        
[31] cli_3.2.0              MassSpecWavelet_1.60.0 crayon_1.5.0          
[34] fs_1.5.2               ncdf4_1.19             fansi_1.0.2           
[37] doParallel_1.0.17      MASS_7.3-55            MsFeatures_1.2.0      
[40] tools_4.1.0            lifecycle_1.0.1        munsell_0.5.0         
[43] cluster_2.1.2          DelayedArray_0.20.0    pcaMethods_1.86.0     
[46] compiler_4.1.0         mzID_1.32.0            rlang_1.0.2           
[49] grid_4.1.0             RCurl_1.98-1.6         iterators_1.0.14      
[52] MsCoreUtils_1.7.4      bitops_1.0-7           gtable_0.3.0          
[55] codetools_0.2-18       DBI_1.1.2              R6_2.5.1              
[58] dplyr_1.0.8            utf8_1.2.2             clue_0.3-60           
[61] parallel_4.1.0         vctrs_0.3.8            DEoptimR_1.0-10       
[64] tidyselect_1.1.2   

Best, Lisa

jorainer commented 2 years ago

I think there is a confusion about the terms. It seems to me that you have DIA data (SWATH or similar?) since you're also running chromatographic peak detection on MS2 and are able to find chrom peaks there. The featureSpectra function was more intended to be used with DDA data, as the function simply looks for each feature (or to be more precise for each chrom peak) if there is an MS2 spectrum with a precursor m/z within the m/z range and a retention time within the rt range of the chrom peaks.

Can you please check precursorMz on your data? Could be that with DIA you don't have that since ions are selected in rather broad m/z windows...

eterlova commented 2 years ago

I have Waters MSe data. I initially ran xcms how you recommend it in the tutorials (plus parameter optimization with IPO), with a single peak picking call, but ran into this problem. Then I found a discussion here (issue 451 https://github.com/sneumann/xcms/issues/451), in which Corey Broeckling shows his script to analyze the same type of data, and decided to try the same. However, it did not make my MS2 spectra appear and I did not change my script when posed this message.

The precursor data (looks odd with 1025 being the only non-NA value, doesn't it?):

> head(precursorMz(xdata_filled), 50)
F01.S0001 F01.S0002 F01.S0003 F01.S0004 F01.S0005 F01.S0006 F01.S0007 F01.S0008 
       NA        NA      1025        NA      1025        NA      1025        NA 
F01.S0009 F01.S0010 F01.S0011 F01.S0012 F01.S0013 F01.S0014 F01.S0015 F01.S0016 
     1025        NA      1025        NA      1025        NA      1025        NA 
F01.S0017 F01.S0018 F01.S0019 F01.S0020 F01.S0021 F01.S0022 F01.S0023 F01.S0024 
     1025        NA      1025        NA      1025        NA      1025        NA 
F01.S0025 F01.S0026 F01.S0027 F01.S0028 F01.S0029 F01.S0030 F01.S0031 F01.S0032 
     1025        NA      1025        NA      1025        NA      1025        NA 
F01.S0033 F01.S0034 F01.S0035 F01.S0036 F01.S0037 F01.S0038 F01.S0039 F01.S0040 
     1025        NA      1025        NA      1025        NA      1025        NA 
F01.S0041 F01.S0042 F01.S0043 F01.S0044 F01.S0045 F01.S0046 F01.S0047 F01.S0048 
     1025        NA      1025        NA      1025        NA        NA      1025 
F01.S0049 F01.S0050 
       NA      1025 
jorainer commented 2 years ago

If I'm not wrong MSe data is similar to Sciex SWATH and MS2 spectra are not created for a single ion but for all ions within a m/z window. Thus, it will not be possible to use the featureSpectra function to extract MS2 spectra.

Maybe check the SWATH data analysis section in the LC-MS/MS. With MSe/SWATH data you perform also chromatographic peak detection on MS2 level and then try to reconstruct (build) the MS2 spectrum of a chromatographic peak using the m/z and intensity values of all MS2 chromatographic peaks with a highly similar peak shape than the MS1 chromatographic peak.

Maybe it might be even better to directly ask Corey @cbroeckl since he has definitely more experience with this type of data than me.

eterlova commented 2 years ago

Oh I see.. Yes, I think this is how MSe works. Thank you for the advice!

Best, Lisa

cbroeckl commented 2 years ago

https://github.com/sneumann/xcms/issues/430

Long conversation here as background.

eterlova commented 2 years ago

@cbroeckl indeed! and thank you for replying! In that thread you mention that the workflow you had at the time is a workaround until @jorainer changes some of the functions, which he did. Does this mean that you don't separate MS1 and MSe files anymore and process them all together?

I have a few more questions about your workflow, not sure if I should list them all here? or maybe if you have some sort of an office hour when I could call? or I could open an issue over in RAMClustR depository?

cbroeckl commented 2 years ago

@eterlova you can email me directly and i can help walk you through it or get a script to you.