sneumann / xcms

This is the git repository matching the Bioconductor package xcms: LC/MS and GC/MS Data Analysis
Other
183 stars 80 forks source link

groupChromPeaks error with more files but not with fewer #587

Closed eterlova closed 2 years ago

eterlova commented 2 years ago

CorWorksFewerFiles.Rdata.zip

Hello! I am trying to perform peak correspondence on my LCMS data. I succeeded with one set of files, but the same code fails when I add more, giving me the error:

pdp <- PeakDensityParam(sampleGroups = xdata_rtaligned$sample_group,
                        minFraction = 0.4, bw = 30)
xdata_correspondence <- groupChromPeaks(xdata_rtaligned, param = pdp)
Error: The 'sampleGroups' value in the provided 'param' class does not match the number of available files/samples!

Even though I edited the phenodata file when added more samples to the analysis. Does anyone have an idea what might be going wrong?

Files attached

eterlova commented 2 years ago

CorWorksFewerFiles.Rdata.zip

eterlova commented 2 years ago

@jorainer @sneumann Sorry for being annoying, but do you have any idea of what might be going on here?

sneumann commented 2 years ago

Hi, unsure where the issue is coming from:

The error message is found in: https://github.com/sneumann/xcms/search?q=The+%27sampleGroups%27+value+in+the+provided+%27param%27+class+does+not+match+the+number+of+available+files%2Fsamples%21

The failing check is thus: https://github.com/sneumann/xcms/blob/0eb8053ad2a05be3c759f4be90df913f3b2b1c31/R/methods-XChromatograms.R#L316

So you could check why that fails. Yours, Steffen

jorainer commented 2 years ago

Sorry for my late reply - busy times. Actually, for me it works (ignore the errors - I don't have the original mzML files, thus the result object is not considered valid):

pdp <- PeakDensityParam(sampleGroups = xdata_rtaligned$sample_group, minFraction = 0.4, bw = 30)
xdata_correspondence <- groupChromPeaks(xdata_rtaligned, param = pdp)
Processing 15111 mz slices ... OK
Error in validObject(object) : 
  invalid class “XCMSnExp” object: 1: Required data file 'DA_metabolomics2021_sample_013.mzML' not found!
invalid class “XCMSnExp” object: 2: Required data file 'DA_metabolomics2021_sample_026.mzML' not found!
invalid class “XCMSnExp” object: 3: Required data file 'DA_metabolomics2021_sample_056.mzML' not found!
invalid class “XCMSnExp” object: 4: Required data file 'DA_metabolomics2021_sample_063.mzML' not found!
invalid class “XCMSnExp” object: 5: Required data file 'DA_metabolomics2021_sample_110.mzML' not found!
invalid class “XCMSnExp” object: 6: Required data file 'DA_metabolomics2021_sample_111.mzML' not found!
invalid class “XCMSnExp” object: 7: Required data file 'DA_metabolomics2021_sample_127.mzML' not found!
invalid class “XCMSnExp” object: 8: Required data file 'DA_metabolomics2021_sample_128.mzML' not found!
invalid class “XCMSnExp” object: 9: Required data file 'DA_metabolomics2021_sample_133.mzML' not found!
invalid class “XC

Can you eventually provide the output of your sessionInfo()? Mine is below (I'm using current R and current Bioconductor release 3.14):

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
[1] xcms_3.16.1         MSnbase_2.20.3      ProtGenerics_1.26.0
[4] S4Vectors_0.32.3    mzR_2.28.0          Rcpp_1.0.7         
[7] Biobase_2.54.0      BiocGenerics_0.40.0 BiocParallel_1.28.2

loaded via a namespace (and not attached):
 [1] lattice_0.20-45             assertthat_0.2.1           
 [3] digest_0.6.28               foreach_1.5.1              
 [5] utf8_1.2.2                  R6_2.5.1                   
 [7] GenomeInfoDb_1.30.0         plyr_1.8.6                 
 [9] mzID_1.32.0                 ggplot2_3.3.5              
[11] pillar_1.6.4                zlibbioc_1.40.0            
[13] rlang_0.4.12                Matrix_1.3-4               
[15] preprocessCore_1.56.0       RCurl_1.98-1.5             
[17] munsell_0.5.0               DelayedArray_0.20.0        
[19] compiler_4.1.2              MsFeatures_1.2.0           
[21] pkgconfig_2.0.3             pcaMethods_1.86.0          
[23] tidyselect_1.1.1            SummarizedExperiment_1.24.0
[25] tibble_3.1.6                GenomeInfoDbData_1.2.7     
[27] RANN_2.6.1                  IRanges_2.28.0             
[29] codetools_0.2-18            matrixStats_0.61.0         
[31] XML_3.99-0.8                fansi_0.5.0                
[33] crayon_1.4.2                dplyr_1.0.7                
[35] MASS_7.3-54                 bitops_1.0-7               
[37] MassSpecWavelet_1.60.0      grid_4.1.2                 
[39] gtable_0.3.0                lifecycle_1.0.1            
[41] affy_1.72.0                 DBI_1.1.1                  
[43] magrittr_2.0.1              MsCoreUtils_1.6.0          
[45] scales_1.1.1                ncdf4_1.18                 
[47] impute_1.68.0               XVector_0.34.0             
[49] affyio_1.64.0               doParallel_1.0.16          
[51] limma_3.50.0                robustbase_0.93-9          
[53] ellipsis_0.3.2              generics_0.1.1             
[55] vctrs_0.3.8                 RColorBrewer_1.1-2         
[57] iterators_1.0.13            tools_4.1.2                
[59] glue_1.5.0                  DEoptimR_1.0-9             
[61] purrr_0.3.4                 MatrixGenerics_1.6.0       
[63] parallel_4.1.2              clue_0.3-60                
[65] colorspace_2.0-2            cluster_2.1.2              
[67] BiocManager_1.30.16         vsn_3.62.0                 
[69] GenomicRanges_1.46.1        MALDIquant_1.20            
eterlova commented 2 years ago

Thank you both for replying! And sorry again for being so rude.

1) I checked length(sampleGroups(param)) != ncol(object) even before writing to you, but the thing is that it returns TRUE for both files, but in one case gives an error, while in the other just goes on to align my samples

2) Johannes, that is so weird that it works for you! I tried both locally and on a cluster, but get the same result. I noticed that I had an older version of XCMS, so I updated it, but nothing has changed. I do run the latest Bioconductor, but for some reason 4.1.0 is the latest version of R that I menage to install to my conda environment. Could that be the cause for my issue? My session info is below

R version 4.1.0 (2021-05-18)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /home/FCAM/eterlova/miniconda3/envs/Rmetab/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] data.table_1.14.2           pander_0.6.4               
 [3] RColorBrewer_1.1-2          xcms_3.16.1                
 [5] MSnbase_2.20.1              ProtGenerics_1.26.0        
 [7] mzR_2.28.0                  Rcpp_1.0.7                 
 [9] BiocParallel_1.28.2         SummarizedExperiment_1.24.0
[11] Biobase_2.54.0              GenomicRanges_1.46.1       
[13] GenomeInfoDb_1.30.0         IRanges_2.28.0             
[15] S4Vectors_0.32.3            BiocGenerics_0.40.0        
[17] MatrixGenerics_1.6.0        matrixStats_0.61.0         
[19] magrittr_2.0.1             

loaded via a namespace (and not attached):
 [1] lattice_0.20-45        digest_0.6.29          foreach_1.5.1         
 [4] utf8_1.2.2             R6_2.5.1               plyr_1.8.6            
 [7] mzID_1.32.0            ggplot2_3.3.5          pillar_1.6.4          
[10] zlibbioc_1.40.0        rlang_0.4.12           Matrix_1.3-4          
[13] preprocessCore_1.56.0  RCurl_1.98-1.5         munsell_0.5.0         
[16] DelayedArray_0.20.0    compiler_4.1.0         MsFeatures_1.2.0      
[19] pkgconfig_2.0.3        pcaMethods_1.86.0      tibble_3.1.6          
[22] GenomeInfoDbData_1.2.7 RANN_2.6.1             codetools_0.2-18      
[25] XML_3.99-0.8           fansi_0.5.0            crayon_1.4.2          
[28] MASS_7.3-54            bitops_1.0-7           MassSpecWavelet_1.60.0
[31] grid_4.1.0             gtable_0.3.0           lifecycle_1.0.1       
[34] affy_1.72.0            MsCoreUtils_1.6.0      scales_1.1.1          
[37] ncdf4_1.18             impute_1.68.0          XVector_0.34.0        
[40] affyio_1.64.0          doParallel_1.0.16      limma_3.50.0          
[43] robustbase_0.93-9      ellipsis_0.3.2         vctrs_0.3.8           
[46] iterators_1.0.13       tools_4.1.0            glue_1.5.1            
[49] DEoptimR_1.0-9         parallel_4.1.0         clue_0.3-60           
[52] colorspace_2.0-2       cluster_2.1.2          BiocManager_1.30.16   
[55] vsn_3.62.0             MALDIquant_1.20       
jorainer commented 2 years ago

We both have the same version of xcms, so that can not be the issue here. It is really puzzling. Can you maybe start a clear fresh R somewhere (without pre-loading any previous .RData - i.e. make sure that after starting R ls() returns character(0)) and then load the RData that you provided me above and then try the same code lines I used?

eterlova commented 2 years ago

Hm, it still does not run:

R version 4.1.0 (2021-05-18) -- "Camp Pontanezen"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-conda-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> ls()
character(0)
>library(xcms)
[omit a lot of lines]
> load("CorErrorMoreFiles.Rdata")
> ls()
[1] "raw_data"        "rawdata_cent"    "xdata"           "xdata_pp"       
[5] "xdata_rtaligned"
> pdp <- PeakDensityParam(sampleGroups = xdata_rtaligned$sample_group, minFraction = 0.4, bw = 30)
> xdata_correspondence <- groupChromPeaks(xdata_rtaligned, param = pdp)
Error: The 'sampleGroups' value in the provided 'param' class does not match the number of available files/samples!
jorainer commented 2 years ago

wait, I don't have a CorErrorMoreFiles.RData file, I only have a CorWorksFewerFiles - and on that one it works for me. Can you please provide me also the other file?

eterlova commented 2 years ago

Oh, sorry, I sent the same file twice. I cannot upload it here for some reason, so here is a link https://drive.google.com/file/d/12wn_ktVE7LVaQcE2OuDZkqt8REEXqano/view?usp=sharing

jorainer commented 2 years ago

Hm, indeed, you have more files than sample annotations:

> length(fileNames(xdata_rtaligned))
[1] 127
> length(xdata_rtaligned$sample_group)
[1] 126

How did you read the data and added the sample annotations? Did you subset your data set (by sample) at any stage? Or filter it?

eterlova commented 2 years ago

Oh no. I swear I checked this. I added the files manually to the annotation file hence the error, I am sure. and then loaded them with readMSData function. I found the file missing in the annotation, so it was an error on my part, not worthy of the alarm. closing the issue, and sorry again for bothering you

jorainer commented 2 years ago

no problem. I'm happy that you're able to fix it now.