lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
124 stars 50 forks source link

Error by using readSRMData: Unknown binary data type. #517

Closed luuulu closed 9 months ago

luuulu commented 4 years ago

Hi,

I got the following error if I am using the readSRMData function:

Fehler: Can not open file C:\Users\DataBlank-Test_Blank.mzML! Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Unknown binary data type.

My code: library(MSnbase) data <- readSRMData("C:/Users/DataBlank-Test_Blank.mzML", pdata = NULL)

The data is a LC-MRM from a Sciex TripletQuad. I converted the .wiff file by using proteowizard MS Converter.

I attachted the .wiff and .wiff.scan file and also the already converted .mzML file. MS_Data.zip

What could be the reason for causing this error? Many thanks in advance for any help!

jorainer commented 4 years ago

I had a look at the mzML file and it seems to be incomplete. Maybe pwiz crashed during conversion? The last two lines are:

   <run id="DataBlank-Test_Blank" defaultInstrumentConfigurationRef="IC1" startTimeStamp="2020-06-24T09:02:51Z" defaultSourceFileRef="WIFF">
      <spectrumList count="1005" defaultDataProcessingRef="pwiz_Reader_ABI_conversion">
luuulu commented 4 years ago

Thank you for fast response! uch, something went wrong even I got no error message in the proteowizard msconvert GUI. Sorry for the circumstances!

Nevertheless, I converted it again and now it looks better. But I get still the same error if I am using readSRMData in R. Data.zip

I would really appreciate your help!

jorainer commented 4 years ago

the error message comes actually from the pwiz library which mzR uses to read data from mzML files. What I find a little strange is that you have 3 arrays for each chromatogram, retention time, intensity and MS level. For the test data that I have (and that I can import without problems) the last one (MS level) is not present.

luuulu commented 4 years ago

Thanks for looking into it!

Could it be that the conversion of the MRM data ( .wiff format) to mzML file causes the problem? Should I use specific settings for the conversion? So far, I used MSConvert with the default settings.

jorainer commented 4 years ago

Sorry, that I don't know because I never converted MRM data from wiff to mzML myself.

luuulu commented 4 years ago

okay, no worries. But what would you suggest I do next? Should I open an issue on pwiz because the error comes from the pwiz library?

jorainer commented 4 years ago

That would be an option - or maybe @sneumann knows more about this?

sneumann commented 4 years ago

Hm, nothing obviously suspicious in the file. TIC, BPC and one SRM. But the TIC and BPC have

            <binaryDataArray arrayLength="3434" encodedLength="68">
              <cvParam cvRef="MS" accession="MS:1000522" name="64-bit integer" value=""/>
              <cvParam cvRef="MS" accession="MS:1000574" name="zlib compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000786" name="non-standard data array" value="ms level" unitCvRef="UO" unitAccession="UO:0000186" unitName="dimensionless unit"/>
              <binary>eJztwTEBAAAAwqD1T20MH6AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAOBna1AAAQ==</binary>
            </binaryDataArray>

I removed them manually, and now get to an error error in evaluating the argument 'x' in selecting a method for function 'is.unsorted': incorrect number of dimensions quite deep down.

> traceback()
13: h(simpleError(msg, call))
12: .handleSimpleError(function (cond) 
    .Internal(C_tryCatchHelper(addr, 1L, cond)), "incorrect number of dimensions", 
        base::quote(chr_data[[i]][, 1]))
11: is.unsorted(rtime)
10: Chromatogram(rtime = chr_data[[i]][, 1], intensity = chr_data[[i]][, 
        2], precursorMz = hdr$precursorIsolationWindowTargetMZ[i], 
        productMz = hdr$productIsolationWindowTargetMZ[i], fromFile = idx)
9: (function (file, hdr, idx) 
   {
       current_ids <- paste0(.polarity_char(hdr$polarity), " Q1=", 
           hdr$precursorIsolationWindowTargetMZ, " Q3=", hdr$productIsolationWindowTargetMZ, 
           " collisionEnergy=", hdr$precursorCollisionEnergy)
       if (length(current_ids) != length(unique(current_ids))) 
           warning("file ", basename(file), " contains multiple ", 
               "chromatograms with identical polarity, precursor ", 
               "and product m/z values", call. = FALSE)
       res_chrs <- replicate(nrow(fdata), Chromatogram(fromFile = idx))
       msf <- .openMSfile(file)
       chr_data <- chromatogram(msf, hdr$chromatogramIndex)
       close(msf)
       for (i in seq_len(nrow(hdr))) {
           idx_to_place <- which(lengths(res_chrs) == 0 & fdata_ids == 
               current_ids[i])[1]
           if (is.na(idx_to_place)) 
               stop("Got more redundant chromatograms than expected")
           res_chrs[[idx_to_place]] <- Chromatogram(rtime = chr_data[[i]][, 
               1], intensity = chr_data[[i]][, 2], precursorMz = hdr$precursorIsolationWindowTargetMZ[i], 
               productMz = hdr$productIsolationWindowTargetMZ[i], 
               fromFile = idx)
       }
       res_chrs
   })(dots[[1L]][[1L]], dots[[2L]][[1L]], dots[[3L]][[1L]])
8: mapply(FUN = FUN, ...)
7: eval(mc, env)
6: eval(mc, env)
5: eval(mc, env)
4: standardGeneric("mapply")
3: mapply(files, hdr_list, seq_along(files), FUN = function(file, 
       hdr, idx) {
       current_ids <- paste0(.polarity_char(hdr$polarity), " Q1=", 
           hdr$precursorIsolationWindowTargetMZ, " Q3=", hdr$productIsolationWindowTargetMZ, 
           " collisionEnergy=", hdr$precursorCollisionEnergy)
       if (length(current_ids) != length(unique(current_ids))) 
           warning("file ", basename(file), " contains multiple ", 
               "chromatograms with identical polarity, precursor ", 
               "and product m/z values", call. = FALSE)
       res_chrs <- replicate(nrow(fdata), Chromatogram(fromFile = idx))
       msf <- .openMSfile(file)
       chr_data <- chromatogram(msf, hdr$chromatogramIndex)
       close(msf)
       for (i in seq_len(nrow(hdr))) {
           idx_to_place <- which(lengths(res_chrs) == 0 & fdata_ids == 
               current_ids[i])[1]
           if (is.na(idx_to_place)) 
               stop("Got more redundant chromatograms than expected")
           res_chrs[[idx_to_place]] <- Chromatogram(rtime = chr_data[[i]][, 
               1], intensity = chr_data[[i]][, 2], precursorMz = hdr$precursorIsolationWindowTargetMZ[i], 
               productMz = hdr$productIsolationWindowTargetMZ[i], 
               fromFile = idx)
       }
       res_chrs
   })
2: unlist(mapply(files, hdr_list, seq_along(files), FUN = function(file, 
       hdr, idx) {
       current_ids <- paste0(.polarity_char(hdr$polarity), " Q1=", 
           hdr$precursorIsolationWindowTargetMZ, " Q3=", hdr$productIsolationWindowTargetMZ, 
           " collisionEnergy=", hdr$precursorCollisionEnergy)
       if (length(current_ids) != length(unique(current_ids))) 
           warning("file ", basename(file), " contains multiple ", 
               "chromatograms with identical polarity, precursor ", 
               "and product m/z values", call. = FALSE)
       res_chrs <- replicate(nrow(fdata), Chromatogram(fromFile = idx))
       msf <- .openMSfile(file)
       chr_data <- chromatogram(msf, hdr$chromatogramIndex)
       close(msf)
       for (i in seq_len(nrow(hdr))) {
           idx_to_place <- which(lengths(res_chrs) == 0 & fdata_ids == 
               current_ids[i])[1]
           if (is.na(idx_to_place)) 
               stop("Got more redundant chromatograms than expected")
           res_chrs[[idx_to_place]] <- Chromatogram(rtime = chr_data[[i]][, 
               1], intensity = chr_data[[i]][, 2], precursorMz = hdr$precursorIsolationWindowTargetMZ[i], 
               productMz = hdr$productIsolationWindowTargetMZ[i], 
               fromFile = idx)
       }
       res_chrs
   }), use.names = FALSE)
1: readSRMData("DataBlank-Test_Blank.mzML", pdata = NULL)

So the code is still unhappy with that file. Can @luuulu guess something ? Yours, Steffen

luuulu commented 4 years ago

many thanks for your response! I also do not know more about that new error you got and it is also not clear to me why there is a non-standard data array in my mzML files.... :/

luuulu commented 4 years ago

I opened now an issue at pwiz (https://github.com/ProteoWizard/pwiz/issues/1133), hopefully they know what causes the error.

chambm commented 4 years ago

The integer data arrays for ms level on the TIC chromatogram make it easy for readers so-inclined to split the chromatogram into per-MS level versions. That error would be fixed by updating pwiz version (in retrospect we probably should have ignored and warned about unknown binary arrays). The is.unsorted error doesn't seem pwiz related.

luuulu commented 4 years ago

Thanks @chambm for your response!

@jorainer and @sneumann: Would it be feasible to update the pwiz library? I would love to be able to use MSnbase for my data analysis!

jorainer commented 4 years ago

that would be something that has to be done on the mzR package by @sneumann - that's a delicate process.

sneumann commented 4 years ago

Hi, that update is quite overdue, but my last attempt was a failure. I certainly would need help by a C++ wizard here. Yours, Steffen

luuulu commented 4 years ago

Unfortunately, I can not help with C++. :/

It would be great to know if it would be feasible to do the update in the next weeks? Otherwise I will have to look for another solution as I need to process a large MRM data set.

chambm commented 4 years ago

I can certainly help with any pwiz build errors that pop up.

Danissss commented 4 years ago

Hi, any updates on this issue? Thanks