sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
40 stars 26 forks source link

segmentation fault when reading mzML scan header information #206

Open PMSeitzer opened 4 years ago

PMSeitzer commented 4 years ago

Here is the relevant section of the traceback:

*** caught segfault ***
address 0x0, cause 'unknown'
Traceback:
 1: .External(list(name = "CppMethod__invoke_notvoid", address = <pointer: 0x7ffb3840a240>,     dll = list(name = "Rcpp", path = "/Library/Frameworks/R.framework/Versions/3.5/Resources/library/Rcpp/libs/Rcpp.so",         dynamicLookup = TRUE, handle = <pointer: 0x7ffb4844aa90>,         info = <pointer: 0x7ffb504883c0>), numParameters = -1L),     <pointer: 0x7ffb385f4e40>, <pointer: 0x7ffb385f62a0>, .pointer)
 2: object@backend$getAllScanHeaderInfo()
 3: .local(object, ...)
 4: mzR::header(a_file)
 5: mzR::header(a_file)
 6: eval(lhs, parent, parent)
 7: eval(lhs, parent, parent)
 8: mzR::header(a_file) %>% dplyr::tbl_df() %>% dplyr::mutate(scan = 1:dplyr::n())

This error occured when analyzing an mzML file that was generated using proteowizard applied to a Thermo raw file. Here is an example of one of the scan headers:

          <scan>
              <cvParam cvRef="MS" accession="MS:1000016" name="scan start time" value="0.0032890465" unitCvRef="UO" unitAccession="UO:0000031" unitName="minute"/>
              <cvParam cvRef="MS" accession="MS:1000512" name="filter string" value="FTMS + p ESI Full ms [70.0000-1050.0000]"/>
              <cvParam cvRef="MS" accession="MS:1000616" name="preset scan configuration" value="1"/>
              <cvParam cvRef="MS" accession="MS:1000927" name="ion injection time" value="100.000001490116" unitCvRef="UO" unitAccession="UO:0000028" unitName="millisecond"/>
              <scanWindowList count="1">
                <scanWindow>
                  <cvParam cvRef="MS" accession="MS:1000501" name="scan window lower limit" value="70.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                  <cvParam cvRef="MS" accession="MS:1000500" name="scan window upper limit" value="1050.0" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
                </scanWindow>
              </scanWindowList>
            </scan>

This error appears to happen non-deterministically.

I hypothesize that the issue is associated with mzR's scan header parsing code, in particular, perhaps there is some unexpected/unhandled information in the scan header that results in mzR producing a buffer overflow error.

lgatto commented 4 years ago

Can you reproduce the error with something like this?

test <- replicate(100, mzR::header(a_file))
PMSeitzer commented 4 years ago

Tried again, this time it failed deterministically:

a_file <- mzR::openMSfile(mzML_file)
Error: Can not open file <file-path>! Original error was: Error in pwizModule$open(filename): [SpectrumList_mzML::create()] Bad istream.

This appears to be this issue: https://github.com/sneumann/xcms/issues/264

PMSeitzer commented 4 years ago

my mzR version is 2.16.2. I ran BiocManager::install("mzR") before testing to ensure I had the latest version.

jorainer commented 4 years ago

That mzR version appears to be fairly old - the current stable release version (from Bioconductor 3.9) is 2.18.1. Could you try to get a recent R version (3.6.1) and install Bioconductor along with mzR in that (you should then get the above mentioned stable version).

Francisco-madrid-gambin commented 4 years ago

Sorry for the interruption, I get a similar error: "Original error was: Error in pwizModule$open(filename): boost::filesystem::path codecvt to wstring: error" However I have the latest versions of mzR, R studio and R software (see below). Interestingly, I don't get the error if I use rawConverter to convert from .RAW to .mzXML but when I download any mzXML spectrum from the Metabolights repository (e.g. https://www.ebi.ac.uk/metabolights/MTBLS103), I always get this error. So using the same reading function on these two files, it only works on one of them (.mzXML which was converted by rawConverter but not the downloaded one). Any idea what is happening?

Best regards,

F

Francisco-madrid-gambin commented 4 years ago

It seems that when a file converted with "CompassXport" from Bruker vendor, this converter adds an extra line that crushes the file so mzR cannot read it (not present when proteowizard o rawconverter are used). The extra line in the file that mzR cannot ignore is: <nameValue name="recalibrationTime" value=""/>

Any clue if this is related to this topic?

sneumann commented 4 years ago

Hi, we have been using CXP all the time, I will check if ours include that line above.Which CXP version are you using ? Which command line options ? Yours, Steffen

Francisco-madrid-gambin commented 4 years ago

Hi, the dataset was downloaded from meteabolights (https://www.ebi.ac.uk/metabolights/MTBLS414), so it was converted by someone else. However, we found this info within the file: <software type="conversion" name="CompassXport" version="3.0.7"/>

It fails when you run: mzR = mzR::openMSfile("C:/my_directory/1_0a.mzXML") Error: Can not open file C:/my_directory/1_0a.mzXML! Original error was: Error in pwizModule$open(filename): [Serializer_mzXML::Handler_dataProcessing] Unexpected element name: nameValue

sneumann commented 4 years ago

Hi @Francisco-madrid-gambin , please avoid adding unrelated aspects to existing issue reports. Let's continue discussing your issue in #213 . Yours, Steffen