sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
41 stars 26 forks source link

chromatograms() function extracts all chromatograms instead of a specific ID #172

Open shubham1637 opened 6 years ago

shubham1637 commented 6 years ago

Hi, I have a large .mzML file (~ 6 GB). When I use chromatograms() function, it extracts all chromatograms and loads them into memory. This operation is very expensive for both memory and operation time. Is there a way to extract only specific chromatogram without loading everything in the memory?

lgatto commented 6 years ago

What code are you using? chromatogram is a generic function - without knowing what object is passed to it, we can't know what code is executed. Guessing you use it on an mzRpwiz object, you can use chromatogram(f, i) to extract the ith chromatogram. This is documented in ?chromatogram,mzRpwiz.

shubham1637 commented 6 years ago

Thanks! Yes, I am using mzRpwiz object. Is there a way to extract chromatogram using its "idRef/name"? or how to get the chromatogram index for a particular analyte?

lgatto commented 6 years ago

You should probably try MSnbase::readSRMData and see if the information you are interested in is recorded in the feature data that can be accessed with fData.

jorainer commented 6 years ago

So far I didn't think of a onDisk mode for Chromatogram objects, so readSRMData will read all of the chromatograms into memory. So that will not work in your case. You are right to use the functions from mzR on the mzRpwiz you already have.

chromatograms will however read ALL chromatograms from the file, you will want to use the chromatogram method that reads selected chromatograms depending on the index you provide. To get an overview of all chromatograms (and hence define the index of the chromatograms that you want to extract) use the chromatogramHeader function from mzR (on your mzRpwiz object).

So the usage is mzR::chromatogram(msfile, chrom = <index>) with msfile the mzRpwiz object.

shubham1637 commented 6 years ago

Thanks! I think chromatogramHeader function will be useful for me, however, I do not see this function in my R environment. I am using mzR version 2.12; I assume it might be available in latest mzR 2.14 When I try to update it, it always fetched code from mzR_2.12.0.tar.gz ; do you know how to install latest version of mzR then?

biocLite("mzR")
BioC_mirror: https://bioconductor.org
Using Bioconductor 3.6 (BiocInstaller 1.28.0), R 3.4.4 (2018-03-15).
Installing package(s) ‘mzR’
trying URL 'https://bioconductor.org/packages/3.6/bioc/src/contrib/mzR_2.12.0.tar.gz'
Content type 'application/x-gzip' length 5420945 bytes (5.2 MB)
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mzR_2.12.0   Rcpp_0.12.17

loaded via a namespace (and not attached):
[1] compiler_3.4.4      ProtGenerics_1.10.0 parallel_3.4.4      tools_3.4.4         yaml_2.1.19        
[6] Biobase_2.38.0      codetools_0.2-15    BiocGenerics_0.24.0
jorainer commented 6 years ago

You will need R version 3.5.0. If you install then Bioconductor (as described on their homepage) it will install Bioconductor version 3.7, that has the mzR package with these functions.

shubham1637 commented 6 years ago

Thanks!

shubham1637 commented 6 years ago

I have installed the latest version of mzR on R 3.5.0. I find it difficult to load files, which I was able to load before.

mz <- openMSfile(filename, backend = "pwiz")
Error: Can not open file exp1.chrom.mzML! Original error was: Error in pwizModule$open(filename): std::bad_alloc

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/intel2016.4/r/3.5.0/lib64/R/lib/libR.so
LAPACK: /cvmfs/soft.computecanada.ca/easybuild/software/2017/avx2/Compiler/intel2016.4/r/3.5.0/lib64/R/modules/lapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mzR_2.14.0   Rcpp_0.12.17

loaded via a namespace (and not attached):
[1] compiler_3.5.0      ProtGenerics_1.12.0 parallel_3.5.0     
[4] Biobase_2.40.0      codetools_0.2-15    BiocGenerics_0.26.0
shubham1637 commented 6 years ago

The above problem was because I did not allocate enough memory before. openMSfile is supposed to just create an interface and that is what it is doing. I don't understand why cluster would have to allocate a huge space for this task ?

shubham1637 commented 4 years ago

Hi, I have a question related to the last comment. I have 1.7 Gb chromatogram mzML file. When I use openMSfile file it takes 15 seconds to get mzRpwiz object. I then use mzR::chromatogramHeader to get chromatogramID and chromatogramIndex; It takes 6 seconds to fetch them.

I assume that it is reading indices with mzR::chromatogramHeader , then what are the tasks is openMSfile doing? My project is on visualization so I am trying to quickly fetch chromatogramID and chromatogramIndex.