lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
124 stars 50 forks source link

not properly reading fdata on some MRM samples #486

Closed jmbadia closed 4 years ago

jmbadia commented 4 years ago

Hello, I am reading a mzML file (attached: 27076.1.zip) with MSnBase acquired by MRM from QqQ samples. There are some chromatograms with identical polarity, precursor and product m/z values on it, but with different collision energies

<chromatogram index="3" id="SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333"...>...<...name="collision energy" value="15.0"> <chromatogram index="4" id="SRM SIC Q1=380 Q3=263.996 start=2.0044 end=6.494416667" ....>...<...name="collision energy" value="7.0"> <chromatogram index="5" id="SRM SIC Q1=380 Q3=263.996 start=2.005283333 end=6.495316667"...>...<...name="collision energy" value="25.0">

I cannot read their feature Data correctly. fData() returns the data from the first chromatogram repeated three times, so I'm not able to know the fData from the others chromatograms (I need their collision energy, specifically)

Chroms <- readSRMData("27076.1.mzML")
fData(Chroms)[5:7,c("chromatogramId","precursorCollisionEnergy")]

chromatogramId 5 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333 6 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333 7 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333 precursorCollisionEnergy 5 15 6 15 7 15

Thanks in advance

jorainer commented 4 years ago

For this I have to dig a little deeper into the respective (C++) code in the mzR package. I guess there might be some problem in extracting an unique chromatogram ID from the id string in the chromatogram header in the mzML file.

jorainer commented 4 years ago

So, mzR is reading the files correctly:

> head(mzR::chromatogramHeader(fd))
                                               chromatogramId chromatogramIndex
1                                                         TIC                 1
2 SRM SIC Q1=300 Q3=263.996 start=2.006183333 end=6.496216667                 2
3 SRM SIC Q1=300 Q3=281.996 start=2.006183333 end=6.496216667                 3
4 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333                 4
5      SRM SIC Q1=380 Q3=263.996 start=2.0044 end=6.494416667                 5
6 SRM SIC Q1=380 Q3=263.996 start=2.005283333 end=6.495316667                 6
  polarity precursorIsolationWindowTargetMZ precursorIsolationWindowLowerOffset
1       -1                               NA                                  NA
2        1                              300                                  NA
3        1                              300                                  NA
4        1                              380                                  NA
5        1                              380                                  NA
6        1                              380                                  NA
  precursorIsolationWindowUpperOffset precursorCollisionEnergy
1                                  NA                       NA
2                                  NA                       15
3                                  NA                       15
4                                  NA                       15
5                                  NA                        7
6                                  NA                       25
  productIsolationWindowTargetMZ productIsolationWindowLowerOffset
1                             NA                                NA
2                        263.996                                NA
3                        281.996                                NA
4                        263.996                                NA
5                        263.996                                NA
6                        263.996                                NA
  productIsolationWindowUpperOffset
1                                NA
2                                NA
3                                NA
4                                NA
5                                NA
6                                NA

So, the problem is in fact in MSnbase function to generate the Chromatograms ... I will have a look into that.

jmbadia commented 4 years ago

something related with the warning that appears when you create the chromatograms object?

readSRMData("27062.1.mzML")
Warning messages:
1: file 27062.1.mzML contains multiple chromatograms with identical polarity, precursor and product m/z values 
jorainer commented 4 years ago

Exactly that is the problem. I was generating a unique identifier for chromatograms based on the available metadata, but forgot to consider the collision energy. Fixes for this are in https://github.com/lgatto/MSnbase/pull/487 - hope @lgatto finds the time to merge this and push it to bioconductor.

jmbadia commented 4 years ago

Thank you very much @jorainer, this definitely solves my problem.

Just to contribute with something, I raise a rare case that could continue to give error. If someone made different SRMs with identical metadata (Q1,Q3, Pol and CE) but with different time ranges (very unusual I guess), the chromatogramId would be the same for all SRMs (with a misleading start/end time). Why don't you use an identifier for each chromatogram in a file instead of using its metadata to group them under a unique identifier?

jorainer commented 4 years ago

The main reason was that 1) I wanted to build the identifier based on the variables that are returned for the chromatograms. The start and stop time is not (yet?) reported by mzR and extracting it from the ID string can be tricky because I guess these ID is vendor specific (i.e. each vendor uses a different format). 2) I simply was not aware that something like this might happen. If you say that that is something that can become common we definitely have to check for possibilities to fix that.

Note: these identifiers are actually only used to match SRMs across mzML files.

For your problem: you could either install the current developmental Bioconductor version (by calling BiocManager::install(version = "devel") followed by devtools::install_github("lgatto/MSnbase"). That will install the fixed version. Note that in one week from now this developmental Bioconductor version will be released as version 3.10.

jmbadia commented 4 years ago

Thanks for your detailed answer. No, it seems hardly possible in practice to have such working conditions. And I see your point, identifier is just an id, not a feature source. I will install your devel version and give you a proper feedback

jorainer commented 4 years ago

Thanks for the feedback! Any suggestions (or even better, contributions :) ) are highly welcome!

jmbadia commented 4 years ago

It works perfectly. Same Q1,Q3, polarity + different CE => Different chromatogramId & different characteristics. Thanks so much for your help & consideration, If I have any suggestion (or contribution !!) I'll let you know

chromatogramId
5 SRM SIC Q1=380 Q3=263.996 start=2.002616667 end=6.492633333
6 SRM SIC Q1=380 Q3=263.996 start=2.005283333 end=6.495316667
7      SRM SIC Q1=380 Q3=263.996 start=2.0044 end=6.494416667
  chromatogramIndex polarity precursorIsolationWindowTargetMZ
5                 4        1                              380
6                 6        1                              380
7                 5        1                              380
  precursorIsolationWindowLowerOffset
5                                  NA
6                                  NA
7                                  NA
  precursorIsolationWindowUpperOffset precursorCollisionEnergy
5                                  NA                       15
6                                  NA                       25
7                                  NA                        7
  productIsolationWindowTargetMZ productIsolationWindowLowerOffset
5                        263.996                                NA
6                        263.996                                NA
7                        263.996                                NA
  productIsolationWindowUpperOffset
5                                NA
6                                NA
7                                NA