Closed jmbadia closed 4 years ago
For this I have to dig a little deeper into the respective (C++) code in the mzR
package. I guess there might be some problem in extracting an unique chromatogram ID from the id string in the chromatogram header in the mzML file.
So, mzR
is reading the files correctly:
> head(mzR::chromatogramHeader(fd))
chromatogramId chromatogramIndex
1 TIC 1
2 SRM SIC Q1=300 Q3=263.996 start=2.006183333 end=6.496216667 2
3 SRM SIC Q1=300 Q3=281.996 start=2.006183333 end=6.496216667 3
4 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333 4
5 SRM SIC Q1=380 Q3=263.996 start=2.0044 end=6.494416667 5
6 SRM SIC Q1=380 Q3=263.996 start=2.005283333 end=6.495316667 6
polarity precursorIsolationWindowTargetMZ precursorIsolationWindowLowerOffset
1 -1 NA NA
2 1 300 NA
3 1 300 NA
4 1 380 NA
5 1 380 NA
6 1 380 NA
precursorIsolationWindowUpperOffset precursorCollisionEnergy
1 NA NA
2 NA 15
3 NA 15
4 NA 15
5 NA 7
6 NA 25
productIsolationWindowTargetMZ productIsolationWindowLowerOffset
1 NA NA
2 263.996 NA
3 281.996 NA
4 263.996 NA
5 263.996 NA
6 263.996 NA
productIsolationWindowUpperOffset
1 NA
2 NA
3 NA
4 NA
5 NA
6 NA
So, the problem is in fact in MSnbase
function to generate the Chromatograms
... I will have a look into that.
something related with the warning that appears when you create the chromatograms
object?
readSRMData("27062.1.mzML")
Warning messages:
1: file 27062.1.mzML contains multiple chromatograms with identical polarity, precursor and product m/z values
Exactly that is the problem. I was generating a unique identifier for chromatograms based on the available metadata, but forgot to consider the collision energy. Fixes for this are in https://github.com/lgatto/MSnbase/pull/487 - hope @lgatto finds the time to merge this and push it to bioconductor.
Thank you very much @jorainer, this definitely solves my problem.
Just to contribute with something, I raise a rare case that could continue to give error. If someone made different SRMs with identical metadata (Q1,Q3, Pol and CE) but with different time ranges (very unusual I guess), the chromatogramId
would be the same for all SRMs (with a misleading start/end time). Why don't you use an identifier for each chromatogram in a file instead of using its metadata to group them under a unique identifier?
The main reason was that 1) I wanted to build the identifier based on the variables that are returned for the chromatograms. The start and stop time is not (yet?) reported by mzR
and extracting it from the ID string can be tricky because I guess these ID is vendor specific (i.e. each vendor uses a different format). 2) I simply was not aware that something like this might happen. If you say that that is something that can become common we definitely have to check for possibilities to fix that.
Note: these identifiers are actually only used to match SRMs across mzML files.
For your problem: you could either install the current developmental Bioconductor version (by calling BiocManager::install(version = "devel")
followed by devtools::install_github("lgatto/MSnbase")
. That will install the fixed version. Note that in one week from now this developmental Bioconductor version will be released as version 3.10.
Thanks for your detailed answer. No, it seems hardly possible in practice to have such working conditions. And I see your point, identifier is just an id, not a feature source. I will install your devel version and give you a proper feedback
Thanks for the feedback! Any suggestions (or even better, contributions :) ) are highly welcome!
It works perfectly. Same Q1,Q3, polarity + different CE => Different chromatogramId & different characteristics. Thanks so much for your help & consideration, If I have any suggestion (or contribution !!) I'll let you know
chromatogramId
5 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333
6 SRM SIC Q1=380 Q3=263.996 start=2.005283333 end=6.495316667
7 SRM SIC Q1=380 Q3=263.996 start=2.0044 end=6.494416667
chromatogramIndex polarity precursorIsolationWindowTargetMZ
5 4 1 380
6 6 1 380
7 5 1 380
precursorIsolationWindowLowerOffset
5 NA
6 NA
7 NA
precursorIsolationWindowUpperOffset precursorCollisionEnergy
5 NA 15
6 NA 25
7 NA 7
productIsolationWindowTargetMZ productIsolationWindowLowerOffset
5 263.996 NA
6 263.996 NA
7 263.996 NA
productIsolationWindowUpperOffset
5 NA
6 NA
7 NA
Hello, I am reading a mzML file (attached: 27076.1.zip) with MSnBase acquired by MRM from QqQ samples. There are some chromatograms with identical polarity, precursor and product m/z values on it, but with different collision energies
I cannot read their feature Data correctly. fData() returns the data from the first chromatogram repeated three times, so I'm not able to know the fData from the others chromatograms (I need their collision energy, specifically)
Thanks in advance