wenbostar / proteoQC

proteoQC: an R package for proteomics data quality assessment.
http://bioconductor.org/packages/devel/bioc/html/proteoQC.html
5 stars 5 forks source link

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded #2

Closed HuaZou closed 3 years ago

HuaZou commented 3 years ago

Hi, After installing the developing version proteoQC R package from github, then I ran the following codes,

library(proteoQC)

mgf <- "FASP_210107_1.mgf"
fasta <- "uniprot-proteome_20210223.fasta"

msQCpipe(spectralist = mgf, 
         fasta = fasta, 
         outdir = "./qc",
         miss  = 0,
         enzyme = 1, varmod = 2, fixmod = 1,
         tol = 10, itol = 0.6, cpu = 20,
         mode = "identification")

it reported errors like these:

2021-02-25 17:28:22 
Loading spectra
 (mgf).............................................................................. loaded.
Spectra matching criteria = 120254
Starting threads .................... started.
Computing models:
    testing 1 2 3testing 1 2 3testing 1 2 3testing 1  | 50 ks 
    2 3testing 1 2 3testing 1 2 3testing 1 2 3testing | 100 ks 
     1 2 3testing 1 2 3testing 1 2 3testing 1 2 3test | 150 ks 
    in
        sequences modelled = 152 ks
Model refinement:
    partial cleavage ..... done.
    unanticipated cleavage ..... done.
    modified N-terminus ..... done.
    finishing refinement ... done.
Merging results:
    from 2......3......4......5......6......7......8......9......10......11......12......13......14......15......16......17......18......19......20......

Creating report:
    initial calculations  ..... done.
    sorting  ..... done.
    finding repeats ..... done.
    evaluating results ..... done.
    calculating expectations ..... done.
    writing results ..... done.

Valid models = 103883

Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3664)
    at java.lang.String.<init>(String.java:207)
    at com.sun.org.apache.xerces.internal.xni.XMLString.toString(XMLString.java:188)
    at com.sun.org.apache.xerces.internal.parsers.AbstractDOMParser.characters(AbstractDOMParser.java:1228)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:455)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:842)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:771)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:339)
    at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:205)
    at de.proteinms.xtandemparser.parser.XTandemParser.parseXTandemFile(XTandemParser.java:121)
    at de.proteinms.xtandemparser.parser.XTandemParser.<init>(XTandemParser.java:82)
    at de.proteinms.xtandemparser.xtandem.XTandemFile.<init>(XTandemFile.java:89)
    at cn.bgi.XTandemParser.main(XTandemParser.java:73)
Process file: ./qc/result/FASP_210107_1.mgf_xtandem.xml... 
Error in read.table(logfile, sep = "\t", header = TRUE, stringsAsFactors = FALSE) : no lines available in input

As it showed, the bugs were generated by the "java.lang.OutOfMemoryError: GC overhead limit exceeded", but when I searched the bug via google, I couldn't find any useful information. Thank you in advance if you could give me some advice when you are free.

System information for running proteoQC:

R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 8 (Core)

Matrix products: default
BLAS/LAPACK: /disk/share/anaconda3/lib/libopenblasp-r0.3.10.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
 [1] stats4    parallel  grid      stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] proteoQC_1.13.5     MSnbase_2.16.1      ProtGenerics_1.20.0 S4Vectors_0.28.1    mzR_2.24.1          Rcpp_1.0.4.6       
 [7] Biobase_2.50.0      BiocGenerics_0.36.0 VennDiagram_1.6.20  futile.logger_1.4.3 XML_3.99-0.5        tibble_3.0.4       
[13] dplyr_1.0.2        

loaded via a namespace (and not attached):
 [1] bitops_1.0-6          bit64_4.0.5           doParallel_1.0.16     progress_1.2.2        httr_1.4.2            tools_4.0.2          
 [7] R6_2.5.0              affyio_1.60.0         lazyeval_0.2.2        DBI_1.1.0             colorspace_2.0-0      ade4_1.7-16          
[13] tidyselect_1.1.0      prettyunits_1.1.1     bit_4.0.4             Nozzle.R1_1.1-1       curl_4.3              compiler_4.0.2       
[19] preprocessCore_1.52.0 formatR_1.7           rTANDEM_1.23.1        xml2_1.3.2            plotly_4.9.2.1        scales_1.1.1         
[25] affy_1.68.0           rappdirs_0.3.1        stringr_1.4.0         digest_0.6.27         rmarkdown_2.5         pkgconfig_2.0.3      
[31] htmltools_0.5.0       dbplyr_2.0.0          limma_3.44.3          htmlwidgets_1.5.2     rlang_0.4.9           rstudioapi_0.11      
[37] RSQLite_2.2.1         impute_1.64.0         generics_0.1.0        jsonlite_1.7.1        mzID_1.28.0           BiocParallel_1.24.0  
[43] RCurl_1.98-1.2        magrittr_2.0.1        MALDIquant_1.19.3     munsell_0.5.0         lifecycle_0.2.0       vsn_3.58.0           
[49] stringi_1.5.3         yaml_2.2.1            MASS_7.3-53           zlibbioc_1.36.0       plyr_1.8.6            BiocFileCache_1.14.0 
[55] rpx_1.26.1            blob_1.2.1            crayon_1.3.4          lattice_0.20-41       hms_0.5.3             knitr_1.30           
[61] pillar_1.4.7          seqinr_4.2-4          reshape2_1.4.4        codetools_0.2-16      futile.options_1.0.1  glue_1.4.2           
[67] evaluate_0.14         data.table_1.13.2     pcaMethods_1.80.0     lambda.r_1.2.4        BiocManager_1.30.10   vctrs_0.3.5          
[73] foreach_1.5.1         gtable_0.3.0          purrr_0.3.4           tidyr_1.1.2           assertthat_0.2.1      ggplot2_3.3.2        
[79] xfun_0.18             viridisLite_0.3.0     ncdf4_1.17            iterators_1.0.13      tinytex_0.26          memoise_1.1.0        
[85] IRanges_2.24.1        ellipsis_0.3.1  
wenbostar commented 3 years ago

This is the memory issue. You just need to increase the memory limitation for the function to use: for example, set xmx as 8 (if your computer have memory more than 8 G). In default, it's 2 G.

library(proteoQC)

mgf <- "FASP_210107_1.mgf"
fasta <- "uniprot-proteome_20210223.fasta"

msQCpipe(spectralist = mgf, 
         fasta = fasta, 
         outdir = "./qc",
         miss  = 0,
         enzyme = 1, varmod = 2, fixmod = 1,
         tol = 10, itol = 0.6, cpu = 20,
         mode = "identification",
         xmx=8)
HuaZou commented 3 years ago
xmx=8

It works well. Thanks very much for your answer. Give you five STARs.

wenbostar commented 3 years ago

Thanks.