sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
178 stars 81 forks source link

halfWindowSize errors #559

Open cbroeckl opened 3 years ago

cbroeckl commented 3 years ago

Hello all,

Trying to use continuum mode LC-TOF data in XCMS. i have implemented smoothing and peak picking, and visual examination suggests the following parameters are pretty suitable for my data (LC, Waters Xevo G2 XS QTOF - ie. i see centroids that look plausible):

btp <- SnowParam(workers = 2)  ## for Windows, use MutliCoreParam for linux, SerialParam for no parallel
register(btp)

xr <- xr %>%
  clean() %>%
  smooth(method = "SavitzkyGolay",
         halfWindowSize = 2,
         polynomialOrder = 3) %>%
  pickPeaks(refineMz = "kNeighbors", k = 3,
            halfWindowSize = 3,
            SNR = 5)

All goes well through XCMS centWave peak detection, correspondance, etc. no error messages, peak detection looks plausible at first glance, object called 'xs' created.

Problem 1: I am having trouble with fillPeaks, seems to be related to BioCParallel.

> fpp <- FillChromPeaksParam(expandMz = 0, expandRt = 0, ppm = 0)
> xs <- fillChromPeaks(xs, param = fpp)
> xs <- fillChromPeaks(xs, param = fpp)
Defining peak areas for filling-in .... OK
Start integrating peak areas from original files
Error in result[[njob]] <- value : 
  attempt to select less than one element in OneIndex
In addition: Warning messages:
1: In serialize(data, node$con) :
  'package:stats' may not be available when loading
2: In serialize(data, node$con) :
  'package:stats' may not be available when loading
Error in serialize(data, node$con) : error writing to connection

When i switch to SerialParam() instead, i get a new error:

> btp <- SerialParam()
> xs <- fillChromPeaks(xs, param = fpp)
Defining peak areas for filling-in .... OK
Start integrating peak areas from original files
Requesting 8183 peaks from 20200915-ERYAN-BS-920-fastDDA__TOF_PH_Pos__1__001.mzML ... Error: cannot allocate vector of size 2.2 Gb  

This certainly could point to e memory issue - i am working on a computer with less than ideal memory availability. Windows does report more than 2.2 Gb of available memory, but i anticipate this is a memory issue and i am not sure there is a way to resolve it without moving to a bigger computer.

Problem 2: MS/MS data processing issues.
I am trying to use the MS/MS aggregation functionality, as this is DDA data:

> xf <- featureSpectra(
+   xs,
+   msLevel = 2,
+   expandRt = 0,
+   expandMz = 0.5,
+   ppm = 10,
+   method = c("all"),
+   skipFilled = TRUE,
+   return.type = "MSpectra"
+ )
Error: fun(object@intensity, halfWindowSize = halfWindowSize, ...) : ‘halfWindowSize’ is too large!
In addition: There were 46 warnings (use warnings() to see them)

> warnings()
Warning messages:
1: In smooth_Spectrum(x, method = match.arg(method), halfWindowSize = halfWindowSize,  ... :
  Negative intensities generated. Replaced by zeros.
2: In smooth_Spectrum(x, method = match.arg(method), halfWindowSize = halfWindowSize,  ... :

This is odd to me, becauase I used the same halfWindowSize values in the feature finding data on MS1. Both are continuum data collected at the same MS resolving power. Moreover, if i try to reduce the halfWindowSize value, i run into new problems:

> xs@spectraProcessingQueue[[2]]@ARGS$halfWindowSize = 1
> xs@spectraProcessingQueue[[3]]@ARGS$halfWindowSize = 2
> xf <- featureSpectra(
+   xs,
+   msLevel = 2,
+   expandRt = 0,
+   expandMz = 0.2,
+   ppm = 20,
+   method = c("all"),
+   skipFilled = TRUE,
+   return.type = "MSpectra"
+ )
Error in solve.default(t(X) %*% X) : 
  system is computationally singular: reciprocal condition number = 1.19379e-18

> xs@spectraProcessingQueue[[2]]@ARGS$halfWindowSize = 1.5
> xs@spectraProcessingQueue[[3]]@ARGS$halfWindowSize = 2
> xf <- featureSpectra(
+   xs,
+   msLevel = 2,
+   expandRt = 0,
+   expandMz = 0.2,
+   ppm = 20,
+   method = c("all"),
+   skipFilled = TRUE,
+   return.type = "MSpectra"
+ )
Error in if (any(object@intensity < 0)) msg <- validMsg(msg, "Negative intensities found.") : 
  missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In y[(n - hws + 1L):n] <- tail(coef, hws) %*% tail(x, w) :
  number of items to replace is not a multiple of replacement length
2: In smooth_Spectrum(x, method = match.arg(method), halfWindowSize = halfWindowSize,  :
  Negative intensities generated. Replaced by zeros.
sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 17134)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] magrittr_2.0.1      xcms_3.12.0         BiocParallel_1.24.1 MSnbase_2.16.1     
 [5] ProtGenerics_1.22.0 S4Vectors_0.28.1    mzR_2.24.1          Rcpp_1.0.6         
 [9] Biobase_2.50.0      BiocGenerics_0.36.1

loaded via a namespace (and not attached):
 [1] bitops_1.0-6                matrixStats_0.58.0          fs_1.5.0                   
 [4] usethis_2.0.1               devtools_2.4.0              doParallel_1.0.16          
 [7] RColorBrewer_1.1-2          rprojroot_2.0.2             GenomeInfoDb_1.26.7        
[10] tools_4.0.4                 utf8_1.2.1                  R6_2.5.0                   
[13] affyio_1.60.0               DBI_1.1.1                   colorspace_2.0-0           
[16] withr_2.4.2                 tidyselect_1.1.0            gridExtra_2.3              
[19] prettyunits_1.1.1           processx_3.5.1              MassSpecWavelet_1.56.0     
[22] compiler_4.0.4              preprocessCore_1.52.1       cli_2.4.0                  
[25] DelayedArray_0.16.3         desc_1.3.0                  scales_1.1.1               
[28] DEoptimR_1.0-8              robustbase_0.93-7           affy_1.68.0                
[31] callr_3.6.0                 digest_0.6.27               XVector_0.30.0             
[34] pkgconfig_2.0.3             sessioninfo_1.1.1           MatrixGenerics_1.2.1       
[37] fastmap_1.1.0               limma_3.46.0                rlang_0.4.10               
[40] impute_1.64.0               generics_0.1.0              mzID_1.28.0                
[43] dplyr_1.0.5                 RCurl_1.98-1.3              GenomeInfoDbData_1.2.4     
[46] Matrix_1.3-2                MALDIquant_1.19.3           munsell_0.5.0              
[49] fansi_0.4.2                 MsCoreUtils_1.2.0           lifecycle_1.0.0            
[52] vsn_3.58.0                  MASS_7.3-53.1               SummarizedExperiment_1.20.0
[55] zlibbioc_1.36.0             pkgbuild_1.2.0              plyr_1.8.6                 
[58] grid_4.0.4                  crayon_1.4.1                lattice_0.20-41            
[61] ps_1.6.0                    pillar_1.6.0                GenomicRanges_1.42.0       
[64] codetools_0.2-18            pkgload_1.2.1               XML_3.99-0.6               
[67] glue_1.4.2                  pcaMethods_1.82.0           remotes_2.3.0              
[70] BiocManager_1.30.12         vctrs_0.3.7                 foreach_1.5.1              
[73] testthat_3.0.2              RANN_2.6.1                  gtable_0.3.0               
[76] purrr_0.3.4                 assertthat_0.2.1            cachem_1.0.4               
[79] ggplot2_3.3.3               ncdf4_1.17                  snow_0.4-3                 
[82] tibble_3.1.1                iterators_1.0.13            memoise_2.0.0              
[85] IRanges_2.24.1              ellipsis_0.3.1       
jorainer commented 3 years ago

For the first error: yes, indeed, that looks like a memory problem. What might also help is that you could do the centroiding and export the results as an mzML file again. The advantage will be that any further data processing will be faster. You could eventually have a look here, specifically the centroiding.R script which I use to do the centroiding on all of our (profile) mode data. What I do there is: load each single profile mode file, perform the centroiding and export that file again.

And also note (maybe relevant for point 2)) that I use a different centroiding for MS1 and MS2 data - since I did run into the same issues. In MS2 data (on our instrument) we have much less peaks and thus I had to change the smoothing settings (actually drop that). You can perform the centroiding for MS1 and MS2 data separately by specifying the MS level on which to apply the processing with msLevel. = 1L or msLevel. = 2L (also here, have a look at the centroiding.R script).

cbroeckl commented 3 years ago

@jorainer - thanks. i didn't try removing smoothing and will give that a go. I guess i was just puzzled why i couldn't find any settings that worked without an error. The sparseness is certainly the most likely explanation.