rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
37 stars 25 forks source link

dropNaSpectraVariables drop m/z and intensity #138

Closed michaelwitting closed 4 years ago

michaelwitting commented 4 years ago

Dear RforMassSpectrometry-Team,

when reading MS2 spectra from .mgf or Massbank files the dropNaSpectraVariables removes values in mz and intensity. Any ideas?

Best regards,

Michael

Example file attached

ms2_spectra <- Spectra("prx-5_1_1-A,4_01_35913.mgf",
                                       source = MsBackendMgf(),
                                       backend = MsBackendDataFrame())

mz(dropNaSpectraVariables(ms2_spectra))

prx-5_1_1-A,4_01_35913.zip

sgibb commented 4 years ago

@michaelwitting thanks for reporting this bug!

MsBackendMgf inherits this method from MsBackend. A MWE:

d <- DataFrame(msLevel = 1L, scanIndex = 1L)
d$mz <- list(c(1:3))
d$intensity <- list(c(4:6))

r <- Spectra(d)
mz(dropNaSpectraVariables(r))
sgibb commented 4 years ago

spectraVariables,Spectra removes mz and intensity:

https://github.com/rformassspectrometry/Spectra/blob/0c505e159a72ce030fc4b86f9dc05e93340d6404/R/Spectra.R#L1431-L1434

spectraVariables(r@backend)
#  [1] "msLevel"                 "rtime"
#  [3] "acquisitionNum"          "scanIndex"
#  [5] "mz"                      "intensity"
#  [7] "dataStorage"             "dataOrigin"
#  [9] "centroided"              "smoothed"
# [11] "polarity"                "precScanNum"
# [13] "precursorMz"             "precursorIntensity"
# [15] "precursorCharge"         "collisionEnergy"
# [17] "isolationWindowLowerMz"  "isolationWindowTargetMz"
# [19] "isolationWindowUpperMz"
spectraVariables(r)
#  [1] "msLevel"                 "rtime"
#  [3] "acquisitionNum"          "scanIndex"
#  [5] "dataStorage"             "dataOrigin"
#  [7] "centroided"              "smoothed"
#  [9] "polarity"                "precScanNum"
# [11] "precursorMz"             "precursorIntensity"
# [13] "precursorCharge"         "collisionEnergy"
# [15] "isolationWindowLowerMz"  "isolationWindowTargetMz"
# [17] "isolationWindowUpperMz"

So many methods that are relying on spectraVariables could fail/be affected (spectraData, dropNaSpectraVariables, ...).

@jorainer Is there any reason why spectraVariables,MsBackendDataFrame is different from spectraVariables,Spectra?

jorainer commented 4 years ago

Is there any reason why spectraVariables,MsBackendDataFrame is different from spectraVariables,Spectra?

spectraVariables,Spectra returns all spectra variables except "mz" and "intensity" while the same for MsBackend should return all of them. The reason: users will use the Spectra object and might/will call spectraData,Spectra to get multiple columns at once - but we might not want to return them a very heavy DataFrame that contains also the m/z and intensity values. It would be still possible, by default they are however not returned. The spectraData,MsBackend on the other hand should (by default) return all spectra variables. This was a design decision @lgatto and I made some time ago.

Possible solutions: 1) leave it as it is. 2) spectraVariables,Spectra and spectraVariables,MsBackend return all variables except m/z and intensity and hence spectraData on Spectra and MsBackend would do the same. We would then have to ensure that methods like setBackend do in fact get spectra data including m/z and intensity values from the backend (and I don't know in what other places we have to fix this then too).

Open to any alternative suggestion to make that cleaner/clearer.

lgatto commented 4 years ago

The MRE above looks like a bug to me. From the documentation:

   • ‘dropNaSpectraVariables’: removes spectra variables (i.e.
     columns in the object's ‘spectraData’ that contain only
     missing values (‘NA’). Note that while columns with only
     ‘NA’s are removed, a ‘spectraData’ call after
     ‘dropNaSpectraVariables’ might still show columns containing
     ‘NA’ values for _core_ spectra variables.

According to this, non-core columns that contain only NA values are dropped. mz (irrespective of its core status, that might or might not apply depending on whether we consider the Spectra object itself or its backend) isn't NA only and shouldn't be dropped.

What am I missing?

jorainer commented 4 years ago

Yes, that the function drops mz or intensity is definitely a bug I will fix.

jorainer commented 4 years ago

The other issue is that spectraVariables,Spectra and spectraVariables,MsBackend return different variables, which is indeed confusing. I will also look into that and try to fix.

lgatto commented 4 years ago

The other issue is that spectraVariables,Spectra and spectraVariables,MsBackend return different variables, which is indeed confusing. I will also look into that and try to fix.

It might be confusing from the outside, but I think it makes sense knowing that these object handle different parts of the data. I don't think this is a problem at all - it actually highlights the difference, which is an important design choice.

jorainer commented 4 years ago

OK, but then I will highlight this even more in the documentation of the MsBackend.

jorainer commented 4 years ago

It's fixed in 0.99.7 (also pushed to BioC, so should be available there also within the next few days). Thanks @michaelwitting for reporting!

lgatto commented 4 years ago

Thanks!

michaelwitting commented 4 years ago

Thanks!