sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
181 stars 80 forks source link

findChromePeaks Error: BiocParallel errors element index: 1, 2, 3, 4, 5, 6, ... first error: Spectra are not ordered by retention time! #384

Open davidgarciam opened 5 years ago

davidgarciam commented 5 years ago

Hi! I would appreciate your help with this error, thanks:

Error: BiocParallel errors element index: 1, 2, 3, 4, 5, 6, ... first error: Spectra are not ordered by retention time!

I have followed the code from the vignettes, adjusting the parameters to fit my data

rm(list=ls(all=TRUE))

library(pander) library(RColorBrewer) library(mzR) library(msdata) library(xcms) library(faahKO) library(pheatmap) library(magrittr) library(ChemoSpec) library(speaq)

1. Data import----

Get the full path to the .mzData files

setwd("C:/Users/ASUS/Documents/GCMS_chemometrics/Pyrolysis_project/Raw_data/PCA-Bio-gases/BG-Normal-Temperatura/")

pd <-read.csv("Metatemp.csv", header= TRUE, sep= ";")

filepath <- "C:/Users/ASUS/Documents/GCMS_chemometrics/Pyrolysis_project/Raw_data/PCA-Bio-gases/BG-Normal-Temperatura/GC/"

cdfs <- list.files(filepath, pattern = "mzData", full.names=TRUE, recursive = TRUE)

raw_data <- readMSData(files = cdfs, pdata = new("NAnnotatedDataFrame", pd), mode = "onDisk")

resulting OnDiskMSnExp object contains general information about the number of spectra, retention times, the measured total ion current etc, but does not contain the full raw data (i.e. the m/z and intensity values from each measured spectrum).

2. Initial data inspection----

head(rtime(raw_data)) mzs <- mz(raw_data) #mz data from raw files

mzs_by_file <- split(mzs, f = fromFile(raw_data))

length(mzs_by_file) ##Split the list by file

Get the base peak chromatograms. This reads data from the files.

bpis <- chromatogram(raw_data, aggregationFun = "max")

Define colors for the groups

group_colors <- paste0(brewer.pal(3, "Set1")[1:3], "60")

names(group_colors) <- c("A", "B","D")

Plot all chromatograms.

plot(bpis, col= group_colors[raw_data$Class], xlim= c(0,200)) legend("topright", legend= c("A", "B", "D"), fill= group_colors)

plot(bpis, col= group_colors[raw_data$Class], xlim= c(84,90)) legend("topright", legend= c("A", "B", "D"), fill= group_colors)

Define the rt and m/z range of the peak area

rtr <- c(85, 88.5) mzr <- c(7.85, 183.80)

extract the chromatogram

chr_raw <- chromatogram(raw_data, mz = mzr, rt = rtr) plot(chr_raw, col = group_colors[chr_raw$Class])

raw_data %>% filterRt(rt = rtr) %>% filterMz(mz = mzr) %>% plot(type = "XIC")

cwp <- CentWaveParam(peakwidth = c(3.5, 5), noise = 500000) xdata <- findChromPeaks(raw_data, param = cwp)

xdata <- findChromPeaks(raw_data, param = cwp) Error: BiocParallel errors element index: 1, 2, 3, 4, 5, 6, ... first error: Spectra are not ordered by retention time! In addition: Warning messages: 1: In serialize(data, node$con) : 'package:stats' may not be available when loading 2: In serialize(data, node$con) : 'package:stats' may not be available when loading

This the traceback

traceback() 6: stop(.error_bplist(res)) 5: bplapply(do.call("lapply", args), FUN = findChromPeaks_OnDiskMSnExp, method = "centWave", param = param, BPPARAM = BPPARAM) 4: bplapply(do.call("lapply", args), FUN = findChromPeaks_OnDiskMSnExp, method = "centWave", param = param, BPPARAM = BPPARAM) 3: .local(object, param, ...) 2: findChromPeaks(raw_data, param = cwp) 1: findChromPeaks(raw_data, param = cwp)

And the session info:

sessionInfo() R version 3.5.0 (2018-04-23) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale: [1] LC_COLLATE=Spanish_Colombia.1252 LC_CTYPE=Spanish_Colombia.1252 LC_MONETARY=Spanish_Colombia.1252 [4] LC_NUMERIC=C LC_TIME=Spanish_Colombia.1252

attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] speaq_2.5.0 ChemoSpec_5.0.229 ChemoSpecUtils_0.2.211 magrittr_1.5
[5] pheatmap_1.0.12 faahKO_1.22.0 msdata_0.22.0 RColorBrewer_1.1-2
[9] pander_0.6.3 xcms_3.4.4 BiocParallel_1.16.6 MSnbase_2.8.3
[13] ProtGenerics_1.14.0 S4Vectors_0.20.1 mzR_2.16.2 Rcpp_1.0.1
[17] Biobase_2.42.0 BiocGenerics_0.28.0

loaded via a namespace (and not attached): [1] doParallel_1.0.14 httr_1.4.0 tools_3.5.0 R6_2.4.0
[5] affyio_1.52.0 lazyeval_0.2.2 colorspace_1.4-1 tidyselect_0.2.5
[9] gridExtra_2.3 mQTL_1.0 compiler_3.5.0 MassSpecWavelet_1.48.1 [13] preprocessCore_1.44.0 rvest_0.3.3 xml2_1.2.0 plotly_4.9.0
[17] scales_1.0.0 DEoptimR_1.0-8 robustbase_0.93-4 randomForest_4.6-14
[21] affy_1.60.0 stringr_1.4.0 digest_0.6.18 colorRamps_2.3
[25] pkgconfig_2.0.2 htmltools_0.3.6 itertools_0.1-3 limma_3.38.3
[29] htmlwidgets_1.3 rlang_0.3.4 rstudioapi_0.10 impute_1.56.0
[33] jsonlite_1.6 mzID_1.20.1 dplyr_0.8.0.1 qtl_1.44-9
[37] MetaboMate_0.0.0.9002 MALDIquant_1.19.3 Matrix_1.2-14 munsell_0.5.0
[41] RcppZiggurat_0.1.5 vsn_3.50.0 stringi_1.4.3 pROC_1.14.0
[45] yaml_2.2.0 MASS_7.3-49 zlibbioc_1.28.0 plyr_1.8.4
[49] grid_3.5.0 ggrepel_0.8.0 crayon_1.3.4 doSNOW_1.0.16
[53] lattice_0.20-38 splines_3.5.0 multtest_2.38.0 pillar_1.3.1
[57] reshape2_1.4.3 codetools_0.2-15 XML_3.98-1.19 glue_1.3.1
[61] outliers_0.14 pcaMethods_1.74.0 data.table_1.12.2 BiocManager_1.30.4
[65] nloptr_1.2.1 missForest_1.4 foreach_1.4.4 tidyr_0.8.3
[69] gtable_0.3.0 RANN_2.6.1 purrr_0.3.2 assertthat_0.2.1
[73] ggplot2_3.1.1 Rfast_1.9.3 viridisLite_0.3.0 ncdf4_1.16.1
[77] survival_2.41-3 tibble_2.1.1 snow_0.4-3 iterators_1.0.10
[81] ptw_1.9-13 IRanges_2.16.0 ellipse_0.4.1 cluster_2.0.7-1

jorainer commented 5 years ago

Hi @davidgarciam ! Now that's an error I have not seen before. According to the error message the spectra in (one or some of) your data files are not ordered increasingly by retention time.

First let's find out which of your files are affected - then we can dig into that to see how the data looks like. So, can you please do the following and post the output?

tmp <- split(raw_data, f = fromFile(raw_data))
vapply(tmp, function(z) is.unsorted(rtime(z)), logical(1))

those with a TRUE are the trouble makers as their retention times are not ordered.

sneumann commented 5 years ago

Hi, @davidgarciam, in addition, you can look at the scan times. If you have a very fast scanning GC/MS, maybe some retention times are identical ? That still counts as non-monotonously increasing. If you convert to the mzData format with proteowizard msconvert, you can instruct the tool to re-order according to retention time. Yours, Steffen

davidgarciam commented 5 years ago

Thanks for your comments professor @jorainer and @sneumann .

OK it seems that all the files but number 3 have problems with the order of rt:

tmp <- split(raw_data, f = fromFile(raw_data)) vapply(tmp, function(z) is.unsorted(rtime(z)), logical(1)) 1 2 3 4 5 6 7 8 TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE

So, I need to order my data according to the rt, but it refers to the total length of the rt or the order of the rt itself inside each file?...

The length of the rt in the chromatograms are different:

a length_rt t1 t2 t3 t4 t5 t6 PMP-250-1 1444 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-250-2 1788 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-325-1 1239 6 6.49998 7.00002 7.5 7.99998 8.50002 PMP-325-2 1553 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-325-3 1286 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-370-1 517 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-370-2 526 0 6.00000 7.00002 7.5 7.99998 8.50002 PMP-370-3 1548 0 6.00000 7.00002 7.5 7.99998 8.50002

or the problem is here?:

head(rtime(bpis[1,1]))

F1.S0002 F1.S0001 F1.S0003 F1.S0004 F1.S0005 F1.S0006 0.00000 6.00000 7.00002 7.50000 7.99998 8.50002

head(rtime(bpis[1,2])) F2.S0002 F2.S0001 F2.S0003 F2.S0004 F2.S0005 F2.S0006 0.00000 6.00000 7.00002 7.50000 7.99998 8.50002 head(rtime(bpis[1,3])) F3.S0001 F3.S0002 F3.S0003 F3.S0004 F3.S0005 F3.S0006 6.00000 6.49998 7.00002 7.50000 7.99998 8.50002

sneumann commented 5 years ago

Hi, I guess the RTs are somewhere not increasing. The first look fine: 6.00000 7.00002 7.5 7.99998 8.50002, but can you plot these 1444 numbers to see if there is something strange ? Can you check if the issue is that some scans have the same RT: which(duplicated()) ? Can you paste the last six RTs, where this problem could loom ? Can you tell us why some files have three times (1788) as many scans as others ( 517) ? Yours, Steffen

davidgarciam commented 5 years ago

Thank you @sneumann. So, I've tried what you suggested to:

plot these 1444 numbers to see if there is something strange ? They seem to be OK

image

image

Can you check if the issue is that some scans have the same RT: which(duplicated())?

which(duplicated(rtime(bpis[1,1])) integer(0) The same result for the others. There are no duplicated rt's

Can you paste the last six RTs, where this problem could loom? These are the tails, they seem to be OK.

a length_rt t1 t2 t3 t4 t5 t6 PMP-250-1 1444 725.0 725.5 726.0 726.5 727.0 727.5 PMP-250-2 1788 897.0 897.5 898.0 898.5 899.0 899.5 PMP-325-1 1239 622.5 623.0 623.5 624.0 624.5 625.0 PMP-325-2 1553 779.5 780.0 780.5 781.0 781.5 782.0 PMP-325-3 1286 646.0 646.5 647.0 647.5 648.0 648.5 PMP-370-1 517 261.5 262.0 262.5 263.0 263.5 264.0 PMP-370-2 526 266.0 266.5 267.0 267.5 268.0 268.5 PMP-370-3 1548 777.0 777.5 778.0 778.5 779.0 779.5

Can you tell us why some files have three times (1788) as many scans as others ( 517)? I didn't obtain the data by myself, they were provided by a colleague. I guess he used more scans in some runs than in others. Since I'm exploring his data, I was surprised too by this fact.

Thanks!

sneumann commented 5 years ago

Ok, then my last idea would be to check which(rtime(bpis[1,1])) != sorted(rtime(bpis[1,1]))) to see where the sorting breaks. You can also make one affected file available and I could have a quick look. Yours, Steffen

jorainer commented 5 years ago

Just a note: in principle it should also be possible to sort the spectra in the OnDiskMSnExp object (i.e. your raw_data).

davidgarciam commented 5 years ago

Well, First I detected the affected files as suggested by prof. @jorainer

tmp <- split(raw_data, f = fromFile(raw_data)) vapply(tmp, function(z) is.unsorted(rtime(z)), logical(1)) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE

Previously, I didn't find duplicated rt's. So I tried to check the rt order in one of the affected files using prof. @sneumann idea:

which(rtime(bpis[1,1]) != sort(rtime(bpis[1,1])) named integer(0)

So they seem to be ordered...

Prof. @jorainer I'm not sure of how to sort the spectra data in the object... I have to extract the rtime, sort it, and return them to the object?.

These are some of the files, 1 affected and the other 2 normals.

Thank you so much for your constant replies and help...

GC.zip

sneumann commented 5 years ago

Hi, The second scan in the files has an RT of 0.0:

grep TimeInMinutes  PMP-100-CUESCO-1.mzData
...
<cvParam cvLabel="psi" accession="PSI:1000038" name="TimeInMinutes" value="0.100000"></cvParam>
...
<cvParam cvLabel="psi" accession="PSI:1000038" name="TimeInMinutes" value="0.000000"></cvParam>

not sure how this gets there. Yours, Steffen

sneumann commented 5 years ago

The other thing where we might need input from @jorainer is

Browse[2]> head(sort(rt))
F1.S0001 F2.S0001 F1.S0002 F2.S0002 F1.S0003 F2.S0003 
 6.00000  6.00000  6.49998  6.49998  7.00002  7.00002 

in particular, what are F1.X and F2.X ? Yours, Steffen

michaelwitting commented 5 years ago

F1.X denotes spectra from file one and F2.X spectra from file two, e.g. F1.S0001 means first spectrum from file one and F1.S0002 second spectrum and so on...

jorainer commented 5 years ago

Regarding how to sort, I can help there, but we might first want to ensure that the input data is correct. As @sneumann points out, how can you have retention times of 0 in your files? That might be an issue.

Now, if only the 0 retention times are a problem you could filter your object by retention time keeping spectra with a retention time > 0:

raw_data <- filterRt(raw_data, c(0.1, max(rtime(raw_data))))

and then try to run the analysis.