Closed cyan20200410 closed 2 years ago
There's not much we can do without a reproducible example, I'm afraid.
I would also advise to look into the Spectra package, which is the one we put our efforts into now.
There's not much we can do without a reproducible example, I'm afraid.
I would also advise to look into the Spectra package, which is the one we put our efforts into now.
Hi Igatto
Thank you for replying.
I am working on this excise here: https://github.com/DorresteinLaboratory/XCMS3_FeatureBasedMN
I used the script from 'XCMS3_Preprocessing'
Is that useful to look into my case?
I'll have a look into that.
I will not do the exercise. I will only have a look of you provide me with the prepared data.
The reason for the error is most likely the difference between the MSnbase::combineSpectra
and the newer Spectra::combineSpectra
. I'll update the tutorial to fix that.
Ok, thanks. Closing now, but feel free to open up again if needed.
The reason for the error is most likely the difference between the
MSnbase::combineSpectra
and the newerSpectra::combineSpectra
. I'll update the tutorial to fix that.
Thank you. I will wait for that.
I will not do the exercise. I will only have a look of you provide me with the prepared data.
Hi Igatti, below is the code I used:
I will not do the exercise. I will only have a look of you provide me with the prepared data.
Hi Igatti, below is the code I used:
Note: This script is optimized for the latest R version (>3.6). Be sure to get the most recent version of R before running. The code below installs the required packages and all needed dependencies. Note: You don't have to run this chunk if the packages are already installed.
if (!requireNamespace("BiocManager"))
install.packages("BiocManager")
BiocManager::install(c("xcms", "CAMERA"))
To follow this example tutorial, download the folder named peak/AMG_Plant_subset from: here
library(RCurl) url = "ftp://massive.ucsd.edu/MSV000082678/ccms_peak/AMG_Plant_subset/30_plants_or_more/" filenames1 = getURL(url, ftp.use.epsv = FALSE, dirlistonly = TRUE) filenames1 <- strsplit(filenames1, "\r\n") filenames1 = unlist(filenames1)
for (filename in filenames1) { download.file(paste(url, filename, sep = ""), paste(getwd(), "/", filename,sep = ""))}
Note that the settings for xcms
used in this tutorial were not optimized,
specifically the alignment based on the default obiwarp parameters might
perform a little to strong retention time adjustment.
For more information on optimization of the parameters see the xcms vignette
or the LC-MS data pre-processing with xcms workshop.
Load required libraries and utility functions for GNPS export.
library(xcms)
source("https://raw.githubusercontent.com/jorainer/xcms-gnps-tools/master/customFunctions.R")
Use socket based parallel processing on Windows systems. The number (4
)
defines the number of parallel tasks. Adapt this setting to the number of CPUs
available on your system. Also note that it is usually better to not use all
CPUs of a system as a) during the analysis the MS data has to be imported from
the original mzML/mzXML/CDF files and it will thus be limited by the I/O of the
hard disks and b) the computer needs to have enough memory to load the complete
MS data of as many raw data files than there are parallel jobs.
if (.Platform$OS.type == "unix") {
register(bpstart(MulticoreParam(4)))
} else {
register(bpstart(SnowParam(4)))
}
Load all .mzXML files and define sample grouping. Note that for this example we assign all samples to the same group. This should be changed according to the experimental setup.
mzMLfiles <- paste0('AMG_Plant_subset/',
list.files(path = 'AMG_Plant_subset/',
pattern = ".mzXML$", recursive = TRUE))
s_groups <- rep("sample", length(mzMLfiles))
pheno <- data.frame(
sample_name = sub(basename(mzMLfiles),
pattern = ".mzML",replacement = "", fixed = TRUE),
sample_group = s_groups, stringsAsFactors = FALSE)
head(pheno)
Read all raw data (which includes MS1 and MS2 spectra).
rawData <- readMSData(files = mzMLfiles, centroided. = TRUE, mode = "onDisk",
pdata = new("NAnnotatedDataFrame", pheno))
Create a base peak chromatogram (BPC) for visual inspection.
bpis <- chromatogram(rawData, aggregationFun = "max")
plot(bpis)
### Peak picking
Define settings for the centWave peak detection. As mentioned in the
introduction, these settings should always be adapted to the analyzed data set.
```{r}
cwp <- CentWaveParam(snthresh = 5, noise = 1000, peakwidth = c(3, 30), ppm = 20)
Perform the chromatographic peak detection using centWave.
processedData <- findChromPeaks(rawData, param = cwp)
Get an overview of the detected peaks, using a heatmap which represents the number of peaks detected for each file along the retention time range.
plotChromPeakImage(processedData, binSize = 10)
Do an obiwarp-based alignment using the default settings (ideally adapt settings to the analyzed data set).
processedData <- adjustRtime(processedData, param = ObiwarpParam())
Plot the difference between adjusted and raw retention times.
plotAdjustedRtime(processedData)
Define the parameters for the peak density-based peak grouping (correspondence analysis).
pdp <- PeakDensityParam(sampleGroups = processedData$sample_group,
minFraction = 0.10)
processedData <- groupChromPeaks(processedData, param = pdp)
Fill-in missing peaks. Peak detection might have failed for some features in
some samples. The fillChromPeaks
function allows to integrate for such cases
all signal in the respective m/z - retention time range. Below we first define
the median width of identified chromatographic peaks in retention time dimension
and use this as parameter fixedRt
for the fillChromPeaks
.
medWidth <- median(chromPeaks(processedData)[, "rtmax"] -
chromPeaks(processedData)[, "rtmin"])
## fill missing peaks
processed_Data <- fillChromPeaks(
processedData, param = FillChromPeaksParam(fixedRt = medWidth))
Below we use the featureSpectra
function to extract all MS2 spectra with their
precursor m/z being within the m/z range of a feature/peak and their retention
time within the rt range of the same feature/peak. Note that for older xcms
versions (i.e. before version 3.12) return.type = "Spectra"
has to be used
instead of return.type = "MSpectra"
as in the example below. Zero-intensity
values are removed from each spectrum with the clean
function, and
subsequently processed into the expected format using the formatSpectraForGNPS
function.
## export the individual spectra into a .mgf file
filteredMs2Spectra <- featureSpectra(processedData, return.type = "MSpectra")
filteredMs2Spectra <- clean(filteredMs2Spectra, all = TRUE)
filteredMs2Spectra <- formatSpectraForGNPS(filteredMs2Spectra)
The extracted MS2 spectra are saved as ms2spectra_all.mgf file. This file can for example be used to do in silico structure prediction through SIRIUS+CSI:FingerID.
writeMgfData(filteredMs2Spectra, "ms2spectra_all.mgf")
Export peak area quantification table. To this end we first extract the feature
definitions (i.e. the m/z and retention time ranges and other metadata for all
defined features in the data set) and then the integrated peak areas (with the
featureValues
function). This peak area quantification table contains features
and respective per sample peak areas in columns. The combined data is then saved
to the file xcms_all.txt. Note that it is now also possible to use the entire
feature table in the FBMN workflow.
## get feature definitions and intensities
featuresDef <- featureDefinitions(processedData)
featuresIntensities <- featureValues(processedData, value = "into")
## generate data table
dataTable <- merge(featuresDef, featuresIntensities, by = 0, all = TRUE)
dataTable <- dataTable[, !(colnames(dataTable) %in% c("peakidx"))]
head(dataTable)
write.table(dataTable, "xcms_all.txt", sep = "\t", quote = FALSE,
row.names = FALSE)
The filteredMs2Spectra
contains all MS2 spectra with their precursor m/z
within the feature's m/z range and a retention time that is within the retention
time of the chromatographic peak/feature. We thus have multiple MS2 spectra for
each feature (also from each sample). Metadata column "feature_id"
indicates
to which feature a MS2 spectrum belongs:
filteredMs2Spectra
We next select a single MS2 spectrum for each feature and export this reduced
set also as an .mgf file. We use the combineSpectra
function on the list of
spectra and specify with fcol = "feature_id"
how the spectra are grouped
(i.e. all spectra with the same feature id are processed together). On the set
of spectra of the same feature we apply the maxTic
function that simply
returns the spectrum with the largest sum of intensities. We thus select with
the code below the spectrum with the largest total signal as the
representative MS2 spectrum for each feature.
## Select for each feature the Spectrum2 with the largest TIC.
filteredMs2Spectra_maxTic <- combineSpectra(filteredMs2Spectra,
fcol = "feature_id",
method = maxTic)
You have loaded both the Spectra
and MSnbase
packages in your R session. Both define a combineSpectra
method that have however different parameters. You can fix the error by specifically calling the implementation of the MSnbase
package with MSnbase::combineSpectra
instead of combineSpectra
.
You have loaded both the
Spectra
andMSnbase
packages in your R session. Both define acombineSpectra
method that have however different parameters. You can fix the error by specifically calling the implementation of theMSnbase
package withMSnbase::combineSpectra
instead ofcombineSpectra
.
Hi Jorainer
Thank you. It is working now.
Qing
Hi everyone, I have an error msg while using 'combineSpectra', Could anyone help, please?
Code i used
filteredMs2Spectra_maxTic <- combineSpectra(filteredMs2Spectra, fcol = "feature_id", method = maxTic)
Error i recived
Error in combineSpectra(filteredMs2Spectra, fcol = "feature_id", method = maxTic) : length of 'f' and 'p' have to match length of 'x'
Session info
R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252
attached base packages: [1] stats4 stats graphics grDevices utils datasets methods base
other attached packages: [1] pander_0.6.4 BiocStyle_2.22.0 Spectra_1.4.1
[4] magrittr_2.0.1 xcms_3.16.1 BiocParallel_1.28.3 [7] MSnbase_2.20.1 ProtGenerics_1.26.0 S4Vectors_0.32.3
[10] mzR_2.28.0 Rcpp_1.0.7 Biobase_2.54.0
[13] BiocGenerics_0.40.0
loaded via a namespace (and not attached): [1] MatrixGenerics_1.6.0 vsn_3.62.0
[3] foreach_1.5.1 assertthat_0.2.1
[5] BiocManager_1.30.16 affy_1.72.0
[7] GenomeInfoDbData_1.2.7 yaml_2.2.1
[9] robustbase_0.93-9 impute_1.68.0
[11] pillar_1.6.4 lattice_0.20-45
[13] glue_1.6.0 limma_3.50.0
[15] digest_0.6.29 GenomicRanges_1.46.1
[17] RColorBrewer_1.1-2 XVector_0.34.0
[19] colorspace_2.0-2 htmltools_0.5.2
[21] preprocessCore_1.56.0 Matrix_1.4-0
[23] plyr_1.8.6 MALDIquant_1.21
[25] XML_3.99-0.8 pkgconfig_2.0.3
[27] zlibbioc_1.40.0 purrr_0.3.4
[29] snow_0.4-4 scales_1.1.1
[31] RANN_2.6.1 affyio_1.64.0
[33] tibble_3.1.6 generics_0.1.1
[35] IRanges_2.28.0 ggplot2_3.3.5
[37] ellipsis_0.3.2 SummarizedExperiment_1.24.0 [39] cli_3.1.0 MassSpecWavelet_1.60.0
[41] crayon_1.4.2 evaluate_0.14
[43] fs_1.5.2 ncdf4_1.19
[45] fansi_0.5.0 doParallel_1.0.16
[47] MASS_7.3-54 MsFeatures_1.2.0
[49] tools_4.1.2 lifecycle_1.0.1
[51] matrixStats_0.61.0 munsell_0.5.0
[53] cluster_2.1.2 DelayedArray_0.20.0
[55] pcaMethods_1.86.0 compiler_4.1.2
[57] GenomeInfoDb_1.30.0 mzID_1.32.0
[59] rlang_0.4.12 grid_4.1.2
[61] RCurl_1.98-1.5 rstudioapi_0.13
[63] iterators_1.0.13 MsCoreUtils_1.6.0
[65] bitops_1.0-7 rmarkdown_2.11
[67] gtable_0.3.0 codetools_0.2-18
[69] DBI_1.1.2 R6_2.5.1
[71] knitr_1.37 dplyr_1.0.7
[73] fastmap_1.1.0 utf8_1.2.2
[75] clue_0.3-60 parallel_4.1.2
[77] vctrs_0.3.8 DEoptimR_1.0-9
[79] tidyselect_1.1.1 xfun_0.29