tidymass / massprocesser

Raw data processing for mass spectrometry data
https://massprocesser.tidymass.org/
GNU General Public License v3.0
0 stars 0 forks source link

process_data error for negative polarity #1

Open andrewjkwok opened 1 year ago

andrewjkwok commented 1 year ago

Hi,

Thank you for this very comprehensive suite of software for metabolomics. I have a dataset which I was attempting to use this software for and was trying to preprocess my negative polarity rawdata, but ran into an issue after the initial convert_raw_data step. I was able to successfully generate my mzXML files, but then when feeding them into the process_data function, I get the following error:

Error in names(val) <- featureNames(object) : 
  attempt to set an attribute on NULL
Error in massprocesser::process_data(path = "./", polarity = "negative",  : 
  Error in xcms::findChromPeaks.

Looking at the source code (https://rdrr.io/github/tidymass/massprocesser/src/R/process_data.R), I find:

if (is(xdata, class2 = "try-error")) {
        stop("Error in xcms::findChromPeaks.")
}

which suggests to me that there is something wrong with my data class...? This is the traceback (which doesn't seem very helpful):

> traceback()
2: stop("Error in xcms::findChromPeaks.")
1: massprocesser::process_data(path = "./", polarity = "negative", 
       ppm = 15, peakwidth = c(5, 30), snthresh = 5, noise = 500, 
       threads = 6, output_tic = TRUE, output_bpc = TRUE, output_rt_correction_plot = TRUE, 
       min_fraction = 0.5, fill_peaks = FALSE)

I run into no such problem with the positive polarity data. Please let me know what else I could provide to help, and many thanks in advance.

andrewjkwok commented 1 year ago

Hello - wanted to check whether there was any update on this issue? Many thanks in advance.

jaspershen commented 1 year ago

Hi, there. It is difficult to find the problem without the code and data you used. I am not sure how many mzxml files you have for processing. I would recommend that you can use 2 or 3 files to run the process_data function again, and if the error persists, you can share the data and code with me, so I can try to identify the issue and fix it. Thank you.

andrewjkwok commented 1 year ago

Hi, thanks for the reply. I'm still running into problems using a more limited set of files (actually I only have a single mzXML file per sample). Happy to share data and code - is there an email I can share a google drive link to?

jaspershen commented 1 year ago

shenxt@stanford.edu

andrewjkwok commented 1 year ago

Fantastic, thanks. Have shared the link with data and script. Please let me know what else might be needed / whether the error can be reproduced on your side.

This is my session info for reference:


R version 4.2.2 Patched (2022-11-10 r83330)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8        LC_COLLATE=C.UTF-8    
 [5] LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8    LC_PAPER=C.UTF-8       LC_NAME=C             
 [9] LC_ADDRESS=C           LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] massconverter_1.0.3   tictoc_1.2            lubridate_1.9.2       forcats_1.0.0        
 [5] stringr_1.5.0         purrr_1.0.1           readr_2.1.4           tibble_3.2.1         
 [9] tidyverse_2.0.0       metid_1.2.28          metpath_1.0.5         ComplexHeatmap_2.14.0
[13] mixOmics_6.22.0       lattice_0.20-45       MASS_7.3-58           massstat_1.0.4       
[17] tidyr_1.3.0           ggfortify_0.4.16      massqc_1.0.6          masscleaner_1.0.11   
[21] xcms_3.20.0           MSnbase_2.24.2        ProtGenerics_1.30.0   S4Vectors_0.36.2     
[25] mzR_2.32.0            Rcpp_1.0.10           Biobase_2.58.0        BiocGenerics_0.44.0  
[29] BiocParallel_1.32.6   massprocesser_1.0.10  ggplot2_3.4.2         dplyr_1.1.2          
[33] magrittr_2.0.3        masstools_1.0.10      massdataset_1.0.24    tidymass_1.0.8       

loaded via a namespace (and not attached):
  [1] utf8_1.2.3                  tidyselect_1.2.0            robust_0.7-1               
  [4] htmlwidgets_1.6.2           munsell_0.5.0               codetools_0.2-18           
  [7] preprocessCore_1.60.2       future_1.32.0               withr_2.5.0                
 [10] colorspace_2.1-0            knitr_1.43                  rstudioapi_0.14            
 [13] robustbase_0.95-1           mzID_1.36.0                 listenv_0.9.0              
 [16] MatrixGenerics_1.10.0       GenomeInfoDbData_1.2.9      polyclip_1.10-4            
 [19] farver_2.1.1                parallelly_1.36.0           vctrs_0.6.2                
 [22] generics_0.1.3              xfun_0.39                   timechange_0.2.0           
 [25] itertools_0.1-3             randomForest_4.7-1.1        R6_2.5.1                   
 [28] doParallel_1.0.17           GenomeInfoDb_1.34.9         graphlayouts_1.0.0         
 [31] clue_0.3-64                 MsCoreUtils_1.10.0          bitops_1.0-7               
 [34] DelayedArray_0.24.0         scales_1.2.1                ggraph_2.1.0               
 [37] nnet_7.3-18                 gtable_0.3.3                affy_1.76.0                
 [40] globals_0.16.2              tidygraph_1.2.3             rlang_1.1.1                
 [43] GlobalOptions_0.1.2         Rdisop_1.58.0               lazyeval_0.2.2             
 [46] impute_1.72.3               checkmate_2.2.0             BiocManager_1.30.21        
 [49] reshape2_1.4.4              stevedore_0.9.5             backports_1.4.1            
 [52] Hmisc_5.1-0                 MassSpecWavelet_1.64.1      tools_4.2.2                
 [55] affyio_1.68.0               RColorBrewer_1.1-3          proxy_0.4-27               
 [58] plyr_1.8.8                  base64enc_0.1-3             progress_1.2.2             
 [61] zlibbioc_1.44.0             RCurl_1.98-1.12             prettyunits_1.1.1          
 [64] rpart_4.1.16                viridis_0.6.3               pbapply_1.7-0              
 [67] GetoptLong_1.0.5            SummarizedExperiment_1.28.0 ggrepel_0.9.3              
 [70] cluster_2.1.4               furrr_0.3.1                 data.table_1.14.8          
 [73] RSpectra_0.16-1             openxlsx_4.2.5.2            circlize_0.4.15            
 [76] RANN_2.6.1                  pcaMethods_1.90.0           mvtnorm_1.2-2              
 [79] matrixStats_1.0.0           hms_1.1.3                   patchwork_1.1.2            
 [82] evaluate_0.21               XML_3.99-0.14               readxl_1.4.2               
 [85] fastDummies_1.6.3           IRanges_2.32.0              gridExtra_2.3              
 [88] shape_1.4.6                 compiler_4.2.2              ellipse_0.4.5              
 [91] ncdf4_1.21                  crayon_1.5.2                htmltools_0.5.5            
 [94] corpcor_1.6.10              pcaPP_2.0-3                 tzdb_0.4.0                 
 [97] Formula_1.2-5               rrcov_1.7-3                 tweenr_2.0.2               
[100] MsFeatures_1.6.0            Matrix_1.5-1                cli_3.6.1                  
[103] vsn_3.66.0                  parallel_4.2.2              igraph_1.4.3               
[106] GenomicRanges_1.50.2        pkgconfig_2.0.3             fit.models_0.64            
[109] foreign_0.8-82              plotly_4.10.2               MALDIquant_1.22.1          
[112] foreach_1.5.2               rARPACK_0.11-0              ggcorrplot_0.1.4           
[115] missForest_1.5              rngtools_1.5.2              XVector_0.38.0             
[118] doRNG_1.8.6                 digest_0.6.31               Biostrings_2.66.0          
[121] rmarkdown_2.22              cellranger_1.1.0            htmlTable_2.4.1            
[124] curl_5.0.1                  rjson_0.2.21                lifecycle_1.0.3            
[127] jsonlite_1.8.5              viridisLite_0.4.2           limma_3.54.2               
[130] fansi_1.0.4                 pillar_1.9.0                ggsci_3.0.0                
[133] KEGGREST_1.38.0             fastmap_1.1.1               httr_1.4.6                 
[136] DEoptimR_1.0-14             glue_1.6.2                  remotes_2.4.2              
[139] zip_2.3.0                   png_0.1-8                   iterators_1.0.14           
[142] ggforce_0.4.1               class_7.3-20                stringi_1.7.12             
[145] e1071_1.7-13               
jaspershen commented 1 year ago

Hi, Just checked the issue. And I found this error is because of your data, not the package. You can see that after converting your raw data to mzXML format data, only around 20b for each one is abnormal. I then checked the massconvert package and used my demo raw data; it can get the normal mzxml data using the same package and the same code. So the massconvert package is also OK. So now the issue is because of your raw data. I would like to recommend using the msconver software (which only supports Windows OS), and if you can get the normal mzxml format data, this suggests that the massconvert package should have a bug. And if you still can't get normal mzxml data, so we can confirm your raw data may have an issue. Please let me know the results when you finish this.

andrewjkwok commented 1 year ago

Thanks, this is helpful. Will try to get massconvert working and will let you know the results over the next few days.

andrewjkwok commented 1 year ago

Hi - I can confirm that with msconvert I can produce MZXML files of a pretty large size (380MB), so I'm guessing the bug is on the end off the massconvert package? I've uploaded 4 test MZXML files to the same google drive link for reference.

andrewjkwok commented 1 year ago

Hello - just wanted to quickly check whether there was any update on this issue?