sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
42 stars 26 forks source link

R crash #217

Closed drag05 closed 4 years ago

drag05 commented 4 years ago

R (v.3.6.2) consistently crashes with no error message when using any of the chromatogram() , nChrom(), or tic() functions on openMSfile object.

Example:

require(mzR)
file <- '000_select.mzXML'
tt <- openMSfile(file)

head(tic(tt))

The "000_select.mzXML" data is part of book "Ortutay, Ortutay - Molecular Data Analysis Using R" datasets.

sneumann commented 4 years ago

Hi, thanks for reporting. Does the same happen for any mzML file in the msdata BioC package ? If yes, seems to be caused by the file. Is the Proteowizard msaccess (I think) tool successfully read the file '000_select.mzXML' ? If yes, it is back to mzR :-(. Can you then attach '000_select.mzXML' to this issue if it is small enough ? Yours, Steffen

lgatto commented 4 years ago

I can confirm that the msdata files work fine (tested some of them earlier this week in a course). I am happy to test the 000_select.mzXML file when we get it.

drag05 commented 4 years ago

Thank you for the prompt response!

The data was read correctly and peaks(), spectra(), header() etc. worked. Also, the chromatogram() examples given in the mzR manual work well. This data however, failed on R Console as well as on R Studio. Please find the file attached

000_select.zip

lgatto commented 4 years ago

Indeed

> library("mzR")
Loading required package: Rcpp
> ms <- openMSfile("000_select.mzXML")
> ms
Mass Spectrometry file handle.
Filename:  000_select.mzXML 
Number of scans:  7 
> length(ms)
[1] 7
> header(ms)
*** output flushed ***
> dim(header(ms))
[1]  7 31
> str(peaks(ms))
List of 7
 $ : num [1:686, 1:2] 302 303 305 306 307 ...
 $ : num [1:910, 1:2] 300 301 302 303 304 ...
 $ : num [1:1015, 1:2] 301 301 302 303 305 ...
 $ : num [1:1171, 1:2] 300 301 302 303 304 ...
 $ : num [1:1138, 1:2] 300 301 302 302 303 ...
 $ : num [1:1041, 1:2] 300 301 302 303 304 ...
 $ : num [1:1079, 1:2] 300 302 304 308 310 ...
> chromatogram(ms)

 *** caught segfault ***
address (nil), cause 'memory not mapped'

Traceback:
 1: .External(list(name = "CppMethod__invoke_notvoid", address = <pointer: 0x55d0e2380190>,     dll = list(name = "Rcpp", path = "/home/lgatto/R/x86_64-pc-linux-gnu-library/3.6/Rcpp/libs/Rcpp.so",         dynamicLookup = TRUE, handle = <pointer: 0x55d0e3d04360>,         info = <pointer: 0x55d0e113e970>), numParameters = -1L),     <pointer: 0x55d0e2a2e390>, <pointer: 0x55d0e4935fd0>, .pointer)
 2: object@backend$getLastChrom()
 3: nChrom(object)
 4: .local(object, ...)
 5: chromatogram(ms)
 6: chromatogram(ms)

Possible actions:
1: abort (with core dump, if enabled)
2: normal R exit
3: exit R without saving workspace
4: exit R saving workspace
Selection: 

The file contains only 7 spectra and if it was modified manually, it might actually not be valid. Note that using mscat (from proteowizard) works, which is compatible with what we have in R.

This was done with

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=fr_FR.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=fr_FR.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=fr_FR.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=fr_FR.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mzR_2.20.0 Rcpp_1.0.3

loaded via a namespace (and not attached):
[1] compiler_3.6.2      ProtGenerics_1.19.3 tools_3.6.2        
[4] parallel_3.6.2      Biobase_2.46.0      codetools_0.2-16   
[7] ncdf4_1.17          BiocGenerics_0.32.0
drag05 commented 4 years ago

@lgatto

This is the warning message I get when loading mzR:

> require(mzR)
Loading required package: mzR
Loading required package: Rcpp
Warning message:
In fun(libname, pkgname) :
  mzR has been built against a different Rcpp version (1.0.2)
than is installed on your system (1.0.3). This might lead to errors
when loading mzR. If you encounter such issues, please send a report,
including the output of sessionInfo() to the Bioc support forum at 
https://support.bioconductor.org/. For details see also
https://github.com/sneumann/mzR/wiki/mzR-Rcpp-compiler-linker-issue.

The issue could be the Rcpp version. Would this mean that the mzR will be updated soon?

Also,

> sessionInfo()
R version 3.6.2 (2019-12-12)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252 
[2] LC_CTYPE=English_United States.1252   
etc.

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mzR_2.18.1 Rcpp_1.0.3

loaded via a namespace (and not attached):
[1] compiler_3.6.2      ProtGenerics_1.16.0 parallel_3.6.2     
[4] Biobase_2.44.0      codetools_0.2-16    ncdf4_1.17         
[7] BiocGenerics_0.30.0

Thank you!

drag05 commented 4 years ago

@lgatto

Also: why don't I get the message

*** caught segfault ***
address (nil), cause 'memory not mapped'

before R crashes? In my case R just turns off.

Is it because of Rcpp version?

Thank you!

lgatto commented 4 years ago

You can ignore the warning, it has nothing to do with the error. I don't know why you don't see the same message - possibly a Windows vs linux difference.

drag05 commented 4 years ago

@lgatto That's exactly what I did, ignoring the warning.

Now, regarding the message:

could it be that the message is OS - related? I would have thought it was more C/C++ related as it mentions memory mapping, with R being OS-independent and all … The term nil is discussed here for example: [https://stackoverflow.com/questions/1683608/c-nil-vs-null]()

sneumann commented 4 years ago

Hi, just realised this is mzXML. Can you convert the original file to mzML instead ? I have no idea whether we support chromatogram() stuff for mzXML in the first place. Yours, Steffen

lgatto commented 4 years ago

It looks indeed as if mzR::chromatogram doesn't check chromatographic data exists in the file, and then crashes if there's none. But at least, after converting to mzML, there's no seg fault:

> ms <- openMSfile("~/Downloads/out/000.mzML")
> ms
Mass Spectrometry file handle.
Filename:  000.mzML 
Number of scans:  7 
> chromatogram(ms)
Error in .local(object, ...) : Index out of bound [1:0].

The solution to this is to use MSnbase.

> library(MSnbase)
> rw <- readMSData("~/Downloads/out/000.mzML", mode = "onDisk")
> chr <- chromatogram(rw)
> chr
Chromatograms with 1 row and 1 column
           000.mzML
     <Chromatogram>
[1,]      length: 7
phenoData with 1 variables
featureData with 1 variables
> plot(chr)
> rw <- readMSData("~/Downloads/000_select.mzXML", mode = "onDisk")
> chr <- chromatogram(rw)
> chr
Chromatograms with 1 row and 1 column
     000_select.mzXML
       <Chromatogram>
[1,]        length: 7
phenoData with 1 variables
featureData with 1 variables
> plot(chr)
drag05 commented 4 years ago

@sneumann, @lgatto:

Funny because the aforementioned book uses mzR to read the file.

I quote from page 247 of this book:

"The mzR package (Chambers et al. 2012) from Bioconductor contains parsers for netCDF, mzXML, mzData, and mzML file formats"

while Bioconductor is in agreement:

"mzR ... comes with a wrapper for the ISB random access parser for mass spectrometry mzXML, mzData and mzML files"

To be fair, the book's Authors go as far as plotting peaks objects without venturing into chromatograms. Maybe they leave it for later chapters.

So, what could be the solution to this beside using another package or converting the data format?

Thank you!

lgatto commented 4 years ago

Well, firstly the ISB parser is outdated - mzR has switched to proteowizard to parse the data.

The solution is to use MSnbase, which uses mzR under the hood.

drag05 commented 4 years ago

OK! Thank you! I'll close this issue now.