sneumann / mzR

This is the git repository matching the Bioconductor package mzR: parser for netCDF, mzXML, mzData and mzML files (mass spectrometry data)
40 stars 26 forks source link

mzR::get3Dmap only return zeros #269

Open joerg-b opened 2 years ago

joerg-b commented 2 years ago

I've recently updated my computer (macOS Monterey, R 4.2, Bioconductor 3.15) and in the process I've also installed mzR. Now, while running some scripts that worked perfectly fine on R 3.6 and R 4.0, I'm encountering some problems. I've located the problem to originate from get3Dmap only returning a matrix of zeros. The dimensions of the matrix seem to be correct. When I use mzR::peaks on the same data file I get lot's of non-zero values.

I can provide a link to an example .mzML file if that helps.

Thanks for your help! Joerg

sneumann commented 2 years ago

Hi, I can confirm, the following "test" fails:

library(mzR)
ms <- openMSfile(list.files(system.file("microtofq", package = "msdata"), pattern="MM14.mzML",full.names=TRUE, recursive = TRUE))
threedee <- get3Dmap(ms, scans=1:100, lowMz=100, highMz=1000, resMz=1)
sum(threedee) > 1

on

mzR_2.30.0
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.4 LTS

It does work on

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.7 LTS
sneumann commented 2 years ago

It is called from https://github.com/sneumann/mzR/blob/458704d680a2eead5acb0a4aa224b6e48ff170a6/R/methods-mzRpwiz.R#L1

Actual calculation happens in C++: https://github.com/sneumann/mzR/blob/4745773a645c1953c24e944b9420eee7a4d9cedd/src/RcppPwiz.cpp#L990

My current suspicion is the updated pwiz version.

sneumann commented 2 years ago

So, some git bisect later:

2aed7135fee1fc23daa795b189bb62d335a3bbbf is the first bad commit
commit 2aed7135fee1fc23daa795b189bb62d335a3bbbf
Date:   Mon Oct 25 10:58:52 2021 +0200

        refactor: patially rewrite code to exctract header info

        - Create index list in R instead of C++.
        - Use `size_t` instead of `int` where suitable.
        - Ensure that correct data types are passed between R and C++.
        - Use new (own) implementation to extract the acquisition number from the
          spectrum ID instead of `id::translateNativeIDToScanNumber` from proteowizard
          to avoid segfaults on macOS (https://github.com/sneumann/xcms/issues/422).

 DESCRIPTION         |   2 +-
 NEWS                |   9 +++++
 R/methods-mzRpwiz.R |  10 +++---
 src/RcppPwiz.cpp    | 101 ++++++++++++++++++++++++++++++++++------------------
 src/RcppPwiz.h      |   8 +++--
 5 files changed, 89 insertions(+), 41 deletions(-)
jorainer commented 2 years ago

Uh - then it seems that's my fault. I'll have a look and fix.

jorainer commented 2 years ago

Sorry, super-stupid misspelling from my side. Wrote DetailLevel_FullMetadata instead of DetailLevel_FullData in the spectrum call, thus only header information were imported but no m/z and intensity values. It's fixed now in PR #269 that also adds a unit test for the get3Dmap function.

sneumann commented 2 years ago

Thank Jörg for reporting, thanks to git bisect run to identify the causal commit (my first guess was wrong), and Jo for the super fast fix. Pushed upstream as 2.31.1 Yours, Steffen