sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
40 stars 26 forks source link

open ABI WIFF files with mzR #259

Closed tnaake closed 2 years ago

tnaake commented 2 years ago

Hello,

I am currently trying to read mzML files from ABI wiff files using mzR/Spectra. My OS is Windows 10 and Proteowizard version for wiff conversion is 3.0.22015 64-bit. Loading the mzML under Ubuntu is not succesful as well (see below).

  1. Windows

Under Windows (mzR v2.28.0) I am using the following command to load the mzML file:

mzR::openMSfile("foo.mzML")
Error: Can not open file foo.mzML! Original error was: Error in pwizModule$open(filename): [IO::HandlerBinaryDataArray] Unknown binary data type.

The issue also appeared in different flavors here https://github.com/lgatto/MSnbase/issues/517 and here https://github.com/ProteoWizard/pwiz/issues/1150. Unfortunately, I cannot update my mzR version via BiocManager::install("sneumann/mzR", ref = "feature/updatePwiz_3_0_21263") (compilation fails here for me under Windows) and thus, I cannot test if this branch fixes the issue.

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=English_Germany.65001  LC_CTYPE=C                       
[3] LC_MONETARY=English_Germany.65001 LC_NUMERIC=C                     
[5] LC_TIME=English_Germany.65001    
system code page: 65001

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8          pillar_1.6.4        compiler_4.1.2      BiocManager_1.30.16
 [5] ProtGenerics_1.26.0 prettyunits_1.1.1   remotes_2.4.2       tools_4.1.2        
 [9] ncdf4_1.19          digest_0.6.29       pkgbuild_1.3.1      evaluate_0.14      
[13] lifecycle_1.0.1     tibble_3.1.6        gtable_0.3.0        pkgconfig_2.0.3    
[17] rlang_0.4.12        rstudioapi_0.13     cli_3.1.1           curl_4.3.2         
[21] yaml_2.2.1          parallel_4.1.2      xfun_0.29           fastmap_1.1.0      
[25] gridExtra_2.3       withr_2.4.3         dplyr_1.0.7         stringr_1.4.0      
[29] knitr_1.37          generics_0.1.1      vctrs_0.3.8         rprojroot_2.0.2    
[33] grid_4.1.2          tidyselect_1.1.1    Biobase_2.54.0      glue_1.6.0         
[37] R6_2.5.1            processx_3.5.2      fansi_1.0.2         rmarkdown_2.11     
[41] mzR_2.28.0          purrr_0.3.4         callr_3.7.0         magrittr_2.0.1     
[45] codetools_0.2-18    BiocGenerics_0.40.0 ps_1.6.0            ellipsis_0.3.2     
[49] htmltools_0.5.2     utf8_1.2.2          stringi_1.7.6       crayon_1.4.2 
  1. Ubuntu

Under Ubuntu I was able to install mzR from the branch feature/updatePwiz_3_0_21263. I then continued to test if I can load the mzML file on Ubuntu (20.04):

> mzR::openMSfile("foo.mzML")
Mass Spectrometry file handle.
Filename:  foo.mzML
Number of scans:  0
> Spectra::Spectra("foo.mzML", backend = MsBackendMzR())
Error: BiocParallel errors
  1 remote errors, element index: 1
  0 unevaluated and other errors
  first remote error: different row counts implied by arguments

There is no error, but Number of scans is 0.

> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.8          codetools_0.2-18    IRanges_2.28.0
 [4] MASS_7.3-55         stats4_4.1.2        MsCoreUtils_1.6.0
 [7] ncdf4_1.19          fs_1.5.2            S4Vectors_0.32.3
[10] mzR_2.29.2          BiocParallel_1.28.3 tools_4.1.2
[13] Spectra_1.4.1       Biobase_2.54.0      ProtGenerics_1.26.0
[16] parallel_4.1.2      compiler_4.1.2      BiocGenerics_0.40.0
[19] clue_0.3-60         cluster_2.1.2

I have attached the mzML file for reference. For index 0 and 1 it is a binaryDataArrayList of length 3 (time array, intensity array, non-standard array). Removing the non-standard array entry and setting the length to 2 does not solve the problem. mzML_files.zip

> mzR::openMSfile("foo_cut.mzML")
Mass Spectrometry file handle.
Filename:  foo_cut.mzML
Number of scans:  0

I get the same output when I run the command under Windows with mzR v2.28.0.

I was wondering if you could help to tell what the source of the error is.

Many thanks!

jorainer commented 2 years ago

I had a look at the files and they actually don't have spectra in it but chromatograms. You can therefore not read them with Spectra (or the mzR::header, mzR::peaks functions). You should be able to read the data with the readSRMData from MSnbase which returns you a MChromatograms object. Also here, to read the foo.mzML file you'll need the newere mzR/proteowizard version. The foo_cut.mzML can be read with the normal mzR.

I hadn't the chance to work on the Chromatograms package for a long time not, but (once finished) that package should be the counterpart of Spectra, just for chromatographic data.

Side note: I suggest to use the proteowizard docker image for conversion to get reliable/reproducible results. I'm using it on our cluster to convert our Sciex wiff files. You can find some information here.

tnaake commented 2 years ago

Hi @jorainer

many thanks for the prompt reply and fix. Works now!

I was wondering if it could help for the future if the man page of openMSfile states that the function will take the information from the spectrum/spectrumList entries. Currently, (at least for me) it is unclear with what kind of mzML files openMSfile is able to read.

I will close the issue then - many thanks again for the help :)