wkumler / RaMS

R-based access to Mass-Spectrometry data
Other
20 stars 7 forks source link

request: parsing chromatograms from mzML acquired in MRM #6

Closed ricardo-cunha closed 1 year ago

ricardo-cunha commented 1 year ago

while enjoying the easiness of RaMS, I have a request for a possible new feature.

mzML allows to store chromatograms and when MS data files with data acquired in MRM (from e.g. Sciex format .wiff) is converted to mzML (via e.g. ProteoWizard) the run results are stored as chromatograms not spectra. It is possible to convert chromatograms to spectra during conversion to mzML in ProteoWizard but repeating isolation mz values become an issue. I was wondering if other chromatograms (besides TIC and BPC) in mzML could be obtained as a new functionality in RaMS. Information for each chromatogram (i.e., name, precursor mz and isolation mz) are present and should be returned with the chromatogram data. Yet, I do not know exactly where is stored in the mzML.

Currently, the error below is returned when MRM data is kept as chromatograms. Error in UseMethod("xml_find_first") : no applicable method for 'xml_find_first' applied to an object of class "xml_missing"

By converting chromatograms to spectra, the MRM data can be obtained as normal "MS1".

You find attached an example mzML data file with MRM from Nitrosamines as chromatograms. Example_MRM_Nitrosamines.zip

Please let me know if you have further questions regarding the request/idea.

Thank you in advance for the consideration and continue with the good work. Ricardo

wkumler commented 1 year ago

Hi Ricardo,

Thanks for sharing your request and some data to test it with! I like the idea of RaMS becoming more friendly to MRM data and preserving those chromatograms in the final output. I've just gotten back from a few weeks of fieldwork but I'll give this a closer look over the next few days and see if there's a good way to implement this.

ricardo-cunha commented 1 year ago

Hi William, nice to hear that you find it a good idea. One could grab "chromatograms" to parse the chromatograms in a given mzML. Shall I propose via pull request? Often the "TIC" chromatogram is also give as chromatogram so a differentiation with the present "TIC" of RaMS shall be made. Perhaps a entry chromatograms in the returned list with a data.table where one column is the ID of the chromatogram. Meanwhile, I have implemented such approach offline. Shall I propose via pull request?

wkumler commented 1 year ago

Yep! Go ahead and submit a PR if you've already got some code written - I'm not seeing any changes on your fork. I agree that adding a "chromatograms" option to grab_what is probably a good way of approaching this. The file you shared throws an error because the metadata is normally pulled from the first "spectrum" xml entry which doesn't exist, thus the xml_missing error.

wkumler commented 1 year ago

I've been able to spend some time on this today and it looks like extracting the chromatograms should be fairly simple - much of the code used elsewhere in the package can be reused here so it's mainly a question of what form the output should take. I've mocked up a possible data table below:

knitr::kable(msdata$chroms[12000:12006,])
chrom_type chrom_index target_mz product_mz rt int
SRM SIC Q1=81 Q3=64 sample=42 period=1 experiment=1 transition=1 3 81 64 8.475967 0
SRM SIC Q1=81 Q3=64 sample=42 period=1 experiment=1 transition=1 3 81 64 8.480417 0
SRM SIC Q1=81 Q3=64 sample=42 period=1 experiment=1 transition=1 3 81 64 8.484883 0
SRM SIC Q1=81 Q3=64 sample=42 period=1 experiment=1 transition=1 3 81 64 8.489333 0
SRM SIC Q1=81 Q3=64 sample=42 period=1 experiment=1 transition=1 3 81 64 8.493800 0
SRM SIC Q1=81 Q3=64 sample=42 period=1 experiment=1 transition=1 3 81 64 8.498267 0
SRM SIC Q1=81 Q3=64 sample=42 period=1 experiment=1 transition=1 3 81 64 8.502717 0

which allows the plotting of individual chromatograms using the following (relatively intuitive) syntax:

library(ggplot2)
chrom5 <- msdata$chroms[chrom_index==5]
ggplot(chrom5) +
  geom_line(aes(x=rt, y=int)) +
  ggtitle(unique(chrom5$chrom_type))

image

ricardo-cunha commented 1 year ago

Hi William,

the edits are not in the branch yet but it seems that your approach is what I though to do/started to do. I find it useful that the target and product mz values can be obtained. Cool! From my side that would be sufficient to read SRM data. Once that is added to the package, I will give it a try with some other data we have.

Also, you could prevent the error when spectra are not present by returning a message that spectra are not present instead.

Thanks a lot for implementing this. I will comeback to you when further ideas are there for improvement.

Cheers, Ricardo

wkumler commented 1 year ago

Edits should be in the implement_chroms branch now! I've added a couple unit tests as well but could use some more comprehensive testing if you're willing to give it a try with some fresh data.

wkumler commented 1 year ago

@ricardobachertdacunha Any luck checking out this latest branch? I'd love to get a few more files tested before I merge it into main.

ricardo-cunha commented 1 year ago

Sorry for the late reply, I am currently out of office. I will do the tests after September 12. I can also share other files with you indeed.

ricardo-cunha commented 1 year ago

@wkumler, I finally tested the branch for SRM parsing with files acquired in MRM mode from two systems (Sciex and Shimadzu). It worked as expected.

Only one issue but not related to MRM parsing. For files acquired with negative polarity, the polarity is returned as "not found" for mzML files. That is because the "@accession="MS:1000130" is applied for positive mode while @accession="MS:1000129" is used for negative mode. I will amend and make a pull request soon. Still need to test for mzXML but I think the issue does not apply, as the node is different.

Thank you very much for adding this functionality to the package.

Cheers, Ricardo