Closed plantton closed 3 years ago
Here is what I have. Slightly different approach than yours, which looks correct:
Load the data
> suppressPackageStartupMessages({
+ library(Spectra)
+ library(PSM)
+ })
> quantFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
+ full.name = TRUE, pattern = "mzXML$")
> identFile <- dir(system.file(package = "MSnbase", dir = "extdata"),
+ full.name = TRUE, pattern = "dummyiTRAQ.mzid")
> sp <- Spectra(quantFile, backend = MsBackendDataFrame())
> id <- filterPSMs(readPSMs(identFile))
Make sure we can match id and scans
> sp$spectrumId
[1] "controllerType=0 controllerNumber=1 scan=1"
[2] "controllerType=0 controllerNumber=1 scan=2"
[3] "controllerType=0 controllerNumber=1 scan=3"
[4] "controllerType=0 controllerNumber=1 scan=4"
[5] "controllerType=0 controllerNumber=1 scan=5"
> sp$spectrumId <- sub("^.+=1 ", "", sp$spectrumId)
> sp$spectrumId
[1] "scan=1" "scan=2" "scan=3" "scan=4" "scan=5"
Join and check:
> sp2 <- joinSpectraData(sp, id, by.y = "spectrumID")
> spectraData(sp2)[, c("sequence", "spectrumId")]
DataFrame with 5 rows and 2 columns
sequence spectrumId
<character> <character>
1 VESITARHGEVLQLRPK scan=1
2 IDGQWVTHQWLKK scan=2
3 NA scan=3
4 NA scan=4
5 LVILLFR scan=5
> id[, c("sequence", "spectrumID", "rank")]
DataFrame with 3 rows and 3 columns
sequence spectrumID rank
<character> <character> <integer>
1 IDGQWVTHQWLKK scan=2 1
2 VESITARHGEVLQLRPK scan=1 1
3 LVILLFR scan=5 1
I think the difference is that you filtered the mzid file by filterPSMs
firstly, then use the DFrame
without duplicates for joinSpectraData
. But in the vignette of MSnbase
, addIdentificationData
accepts the indentification data with duplicates.
So in the new join method, we are assumed to filter identification data before join it to Spectra
object.
MSnbase
is a different package. Please refer to he joinSpectraData
from the Spectra
package.
The joinSpectraData
is a general function, and it's the user that chooses what to join.
When I'm using
Spectra::joinSpectraData
to reproduce the example from MSnbase vignette, I find the result is not the same as the original one.Example from MSnbase vignette:
To reproduce the example by Spectra:
Then use mzR() backend to load
quantFile
:We use spectrumID to match identification DFrame to
sps_ms
, hence I changed thespectrumID
iniddf
:Now use
joinSpectraData
to add identification data to raw data:The matched sequences for spectrum 1/scan = 1, are different in the two results. Am I wrong by using
joinSpectraData
here?