lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

Error msg while using 'combineSpectra' #566

Closed cyan20200410 closed 2 years ago

cyan20200410 commented 2 years ago

Hi everyone, I have an error msg while using 'combineSpectra', Could anyone help, please?

Code i used

filteredMs2Spectra_maxTic <- combineSpectra(filteredMs2Spectra, fcol = "feature_id", method = maxTic)

Error i recived

Error in combineSpectra(filteredMs2Spectra, fcol = "feature_id", method = maxTic) : length of 'f' and 'p' have to match length of 'x'

Session info

R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base

other attached packages: [1] pander_0.6.4 BiocStyle_2.22.0 Spectra_1.4.1
[4] magrittr_2.0.1 xcms_3.16.1 BiocParallel_1.28.3 [7] MSnbase_2.20.1 ProtGenerics_1.26.0 S4Vectors_0.32.3
[10] mzR_2.28.0 Rcpp_1.0.7 Biobase_2.54.0
[13] BiocGenerics_0.40.0

loaded via a namespace (and not attached): [1] MatrixGenerics_1.6.0 vsn_3.62.0
[3] foreach_1.5.1 assertthat_0.2.1
[5] BiocManager_1.30.16 affy_1.72.0
[7] GenomeInfoDbData_1.2.7 yaml_2.2.1
[9] robustbase_0.93-9 impute_1.68.0
[11] pillar_1.6.4 lattice_0.20-45
[13] glue_1.6.0 limma_3.50.0
[15] digest_0.6.29 GenomicRanges_1.46.1
[17] RColorBrewer_1.1-2 XVector_0.34.0
[19] colorspace_2.0-2 htmltools_0.5.2
[21] preprocessCore_1.56.0 Matrix_1.4-0
[23] plyr_1.8.6 MALDIquant_1.21
[25] XML_3.99-0.8 pkgconfig_2.0.3
[27] zlibbioc_1.40.0 purrr_0.3.4
[29] snow_0.4-4 scales_1.1.1
[31] RANN_2.6.1 affyio_1.64.0
[33] tibble_3.1.6 generics_0.1.1
[35] IRanges_2.28.0 ggplot2_3.3.5
[37] ellipsis_0.3.2 SummarizedExperiment_1.24.0 [39] cli_3.1.0 MassSpecWavelet_1.60.0
[41] crayon_1.4.2 evaluate_0.14
[43] fs_1.5.2 ncdf4_1.19
[45] fansi_0.5.0 doParallel_1.0.16
[47] MASS_7.3-54 MsFeatures_1.2.0
[49] tools_4.1.2 lifecycle_1.0.1
[51] matrixStats_0.61.0 munsell_0.5.0
[53] cluster_2.1.2 DelayedArray_0.20.0
[55] pcaMethods_1.86.0 compiler_4.1.2
[57] GenomeInfoDb_1.30.0 mzID_1.32.0
[59] rlang_0.4.12 grid_4.1.2
[61] RCurl_1.98-1.5 rstudioapi_0.13
[63] iterators_1.0.13 MsCoreUtils_1.6.0
[65] bitops_1.0-7 rmarkdown_2.11
[67] gtable_0.3.0 codetools_0.2-18
[69] DBI_1.1.2 R6_2.5.1
[71] knitr_1.37 dplyr_1.0.7
[73] fastmap_1.1.0 utf8_1.2.2
[75] clue_0.3-60 parallel_4.1.2
[77] vctrs_0.3.8 DEoptimR_1.0-9
[79] tidyselect_1.1.1 xfun_0.29

lgatto commented 2 years ago

There's not much we can do without a reproducible example, I'm afraid.

I would also advise to look into the Spectra package, which is the one we put our efforts into now.

cyan20200410 commented 2 years ago

There's not much we can do without a reproducible example, I'm afraid.

I would also advise to look into the Spectra package, which is the one we put our efforts into now.

Hi Igatto

Thank you for replying.

I am working on this excise here: https://github.com/DorresteinLaboratory/XCMS3_FeatureBasedMN

I used the script from 'XCMS3_Preprocessing'

Is that useful to look into my case?

jorainer commented 2 years ago

I'll have a look into that.

lgatto commented 2 years ago

I will not do the exercise. I will only have a look of you provide me with the prepared data.

jorainer commented 2 years ago

The reason for the error is most likely the difference between the MSnbase::combineSpectra and the newer Spectra::combineSpectra. I'll update the tutorial to fix that.

lgatto commented 2 years ago

Ok, thanks. Closing now, but feel free to open up again if needed.

cyan20200410 commented 2 years ago

The reason for the error is most likely the difference between the MSnbase::combineSpectra and the newer Spectra::combineSpectra. I'll update the tutorial to fix that.

Thank you. I will wait for that.

cyan20200410 commented 2 years ago

I will not do the exercise. I will only have a look of you provide me with the prepared data.

Hi Igatti, below is the code I used:

I will not do the exercise. I will only have a look of you provide me with the prepared data.

Hi Igatti, below is the code I used:

Install required packages

Note: This script is optimized for the latest R version (>3.6). Be sure to get the most recent version of R before running. The code below installs the required packages and all needed dependencies. Note: You don't have to run this chunk if the packages are already installed.

if (!requireNamespace("BiocManager"))
    install.packages("BiocManager")
BiocManager::install(c("xcms", "CAMERA"))

Preprocess your data using XCMS3 and export data files for feature-based molecular networking through GNPS

To follow this example tutorial, download the folder named peak/AMG_Plant_subset from: here

library(RCurl) url = "ftp://massive.ucsd.edu/MSV000082678/ccms_peak/AMG_Plant_subset/30_plants_or_more/" filenames1 = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) filenames1 <- strsplit(filenames1, "\r\n") filenames1 = unlist(filenames1)

for (filename in filenames1) { download.file(paste(url, filename, sep = ""), paste(getwd(), "/", filename,sep = ""))}

Note that the settings for xcms used in this tutorial were not optimized, specifically the alignment based on the default obiwarp parameters might perform a little to strong retention time adjustment. For more information on optimization of the parameters see the xcms vignette or the LC-MS data pre-processing with xcms workshop.

Load required libraries and utility functions for GNPS export.

library(xcms)
source("https://raw.githubusercontent.com/jorainer/xcms-gnps-tools/master/customFunctions.R")

Use socket based parallel processing on Windows systems. The number (4) defines the number of parallel tasks. Adapt this setting to the number of CPUs available on your system. Also note that it is usually better to not use all CPUs of a system as a) during the analysis the MS data has to be imported from the original mzML/mzXML/CDF files and it will thus be limited by the I/O of the hard disks and b) the computer needs to have enough memory to load the complete MS data of as many raw data files than there are parallel jobs.

if (.Platform$OS.type == "unix") {
  register(bpstart(MulticoreParam(4)))
} else {
  register(bpstart(SnowParam(4)))
} 

Load data

Load all .mzXML files and define sample grouping. Note that for this example we assign all samples to the same group. This should be changed according to the experimental setup.


mzMLfiles <- paste0('AMG_Plant_subset/',
                    list.files(path = 'AMG_Plant_subset/',
                               pattern = ".mzXML$", recursive = TRUE))
s_groups <- rep("sample", length(mzMLfiles))
pheno <- data.frame(
    sample_name = sub(basename(mzMLfiles),
                      pattern = ".mzML",replacement = "", fixed = TRUE),
    sample_group = s_groups, stringsAsFactors = FALSE)
head(pheno)

Read all raw data (which includes MS1 and MS2 spectra).


rawData <- readMSData(files = mzMLfiles, centroided. = TRUE, mode = "onDisk",
                      pdata = new("NAnnotatedDataFrame", pheno))

Create a base peak chromatogram (BPC) for visual inspection.

bpis <- chromatogram(rawData, aggregationFun = "max")
plot(bpis)

### Peak picking

Define settings for the centWave peak detection. As mentioned in the
introduction, these settings should always be adapted to the analyzed data set.

```{r}
cwp <- CentWaveParam(snthresh = 5, noise = 1000, peakwidth = c(3, 30), ppm = 20)

Perform the chromatographic peak detection using centWave.

processedData <- findChromPeaks(rawData, param = cwp)

Get an overview of the detected peaks, using a heatmap which represents the number of peaks detected for each file along the retention time range.

plotChromPeakImage(processedData, binSize = 10) 

Retention time alignment

Do an obiwarp-based alignment using the default settings (ideally adapt settings to the analyzed data set).

processedData <- adjustRtime(processedData, param = ObiwarpParam())

Plot the difference between adjusted and raw retention times.

plotAdjustedRtime(processedData)

Peak grouping

Define the parameters for the peak density-based peak grouping (correspondence analysis).

pdp <- PeakDensityParam(sampleGroups = processedData$sample_group,
                        minFraction = 0.10)
processedData <- groupChromPeaks(processedData, param = pdp) 

Gap filling

Fill-in missing peaks. Peak detection might have failed for some features in some samples. The fillChromPeaks function allows to integrate for such cases all signal in the respective m/z - retention time range. Below we first define the median width of identified chromatographic peaks in retention time dimension and use this as parameter fixedRt for the fillChromPeaks.

medWidth <- median(chromPeaks(processedData)[, "rtmax"] -
                   chromPeaks(processedData)[, "rtmin"])
## fill missing peaks
processed_Data <- fillChromPeaks(
    processedData, param = FillChromPeaksParam(fixedRt = medWidth))

Export data

export MS1 and MS2 features

Below we use the featureSpectra function to extract all MS2 spectra with their precursor m/z being within the m/z range of a feature/peak and their retention time within the rt range of the same feature/peak. Note that for older xcms versions (i.e. before version 3.12) return.type = "Spectra" has to be used instead of return.type = "MSpectra" as in the example below. Zero-intensity values are removed from each spectrum with the clean function, and subsequently processed into the expected format using the formatSpectraForGNPS function.

## export the individual spectra into a .mgf file
filteredMs2Spectra <- featureSpectra(processedData, return.type = "MSpectra")
filteredMs2Spectra <- clean(filteredMs2Spectra, all = TRUE)
filteredMs2Spectra <- formatSpectraForGNPS(filteredMs2Spectra)

The extracted MS2 spectra are saved as ms2spectra_all.mgf file. This file can for example be used to do in silico structure prediction through SIRIUS+CSI:FingerID.

writeMgfData(filteredMs2Spectra, "ms2spectra_all.mgf")

Export peak area quantification table. To this end we first extract the feature definitions (i.e. the m/z and retention time ranges and other metadata for all defined features in the data set) and then the integrated peak areas (with the featureValues function). This peak area quantification table contains features and respective per sample peak areas in columns. The combined data is then saved to the file xcms_all.txt. Note that it is now also possible to use the entire feature table in the FBMN workflow.

## get feature definitions and intensities
featuresDef <- featureDefinitions(processedData)
featuresIntensities <- featureValues(processedData, value = "into")

## generate data table
dataTable <- merge(featuresDef, featuresIntensities, by = 0, all = TRUE)
dataTable <- dataTable[, !(colnames(dataTable) %in% c("peakidx"))]
head(dataTable)
write.table(dataTable, "xcms_all.txt", sep = "\t", quote = FALSE, 
            row.names = FALSE)

Export MS2 features only

The filteredMs2Spectra contains all MS2 spectra with their precursor m/z within the feature's m/z range and a retention time that is within the retention time of the chromatographic peak/feature. We thus have multiple MS2 spectra for each feature (also from each sample). Metadata column "feature_id" indicates to which feature a MS2 spectrum belongs:

filteredMs2Spectra

We next select a single MS2 spectrum for each feature and export this reduced set also as an .mgf file. We use the combineSpectra function on the list of spectra and specify with fcol = "feature_id" how the spectra are grouped (i.e. all spectra with the same feature id are processed together). On the set of spectra of the same feature we apply the maxTic function that simply returns the spectrum with the largest sum of intensities. We thus select with the code below the spectrum with the largest total signal as the representative MS2 spectrum for each feature.


## Select for each feature the Spectrum2 with the largest TIC.
filteredMs2Spectra_maxTic <- combineSpectra(filteredMs2Spectra,
                                            fcol = "feature_id",
                                            method = maxTic)
jorainer commented 2 years ago

You have loaded both the Spectra and MSnbase packages in your R session. Both define a combineSpectra method that have however different parameters. You can fix the error by specifically calling the implementation of the MSnbase package with MSnbase::combineSpectra instead of combineSpectra.

cyan20200410 commented 2 years ago

You have loaded both the Spectra and MSnbase packages in your R session. Both define a combineSpectra method that have however different parameters. You can fix the error by specifically calling the implementation of the MSnbase package with MSnbase::combineSpectra instead of combineSpectra.

Hi Jorainer

Thank you. It is working now.

Qing