feature request: read different types of scan

tentrillion commented 1 year ago

I've been using RaMS for about a year and it is amazing; finally an easy depencency-light way to fast reading and tidy manipulation of MS data! My feature request is could there be a way to label different types of MS1 scans acquired in the same experiment. (I often acquire data like this when varying ion source parameters.)

I have *.mzML files generated from Sciex *.wiff2 files that were acquired from a qTOF that was running multiple MS1 scan types. Sciex calls the different scan types "experiments" and in the XML (generated via ProteoWizard) these different scan types are referred to like this:

<spectrum index="0" id="sample=1 period=1 cycle=1 experiment=2" defaultArrayLength="2271">
[...]
<spectrum index="1" id="sample=1 period=1 cycle=1 experiment=4" defaultArrayLength="4300">
[...]
 <spectrum index="3" id="sample=1 period=1 cycle=1 experiment=7" defaultArrayLength="3">

I'd be happy to supply an example mzML file.

I imagine one output type might be an extra column (relative to what get_what = c('MS1') returns) containing the spectrum id strings like sample=1 period=1 cycle=1 experiment=4.

wkumler commented 1 year ago

Hi @tentrillion, thanks for the feature request. I'm a little swamped with other tasks for my PhD right now (as you may have already determined from the backlog of issues) but I'm hoping to push out v1.4 by the end of the year or early 2024. This looks like a good contribution for that but I will need some demo mzML files. If you're able to share one or two publicly with a Box or Dropbox link that'd be great - otherwise we'll have to chat about a good way of getting those to me for testing.

wkumler commented 1 year ago

Hi @tentrillion, I've got some time now to work on this issue now and think it's worth the effort. Do you have a demo mzML file you're able to share?

tentrillion commented 3 months ago

Apologies for missing your November reply until now. I couldn't figure out how to attach an mzML directly in this thread. As a (odd I know) workaround I've committed it to a random git repo I use to store / publish random notebooks. LMK if there's a better way to send you these, I have more if you need them. https://github.com/tentrillion/ipython_notebooks/blob/master/example_sciex_multiMS1scantypes.mzML

wkumler commented 3 months ago

Hi @tentrillion, thanks for providing the demo file! It's a good question about how to best go about getting this data and combining it with the rest of the MS1 info. This feels like a similar function to the grabAccessionData but since the information's stored in the spectrum tag itself that doesn't work for extraction. Instead I had to manually read in the XML and extract the experiment number, bind that to the associated retention time, and then merge it back onto the MS1 info. This could definitely be streamlined into a single function (which would then also avoid having to read the mzML file twice) but is this essentially what you're looking for?

library(xml2)
library(RaMS)
library(ggplot2)

xml_data <- read_xml("~/../Downloads/example_sciex_multiMS1scantypes.mzML")
all_spectra <- xml_find_all(xml_data, "//d1:spectrum")
scan_ids <- xml_attr(all_spectra, "id")
experiment_nums <- as.numeric(gsub(".*experiment=", "", scan_ids))
scan_rts <- grabAccessionData("~/../Downloads/example_sciex_multiMS1scantypes.mzML", "MS:1000016")
rt_id_df <- cbind(rt=as.numeric(scan_rts$value), exp_num=experiment_nums)

msdata <- grabMSdata("~/../Downloads/example_sciex_multiMS1scantypes.mzML")

ms1_w_expnum <- merge(msdata$MS1, rt_id_df)

There seem to be some quirky data in the file - each mass is "bracketed" by two zeros on either side at higher and lower masses, creating a strange triplicate data point layout pattern:

ggplot(ms1_w_expnum[mz%between%pmppm(371.09458, 100)]) +
  geom_point(aes(x=rt, y=mz, color=int>0))

but when those points are removed you can see the instrument cycling through each of the different MS1 scan types

ggplot(ms1_w_expnum[mz%between%pmppm(371.09458, 100)][int>0]) +
  geom_point(aes(x=rt, y=mz, color=factor(exp_num)))

and you can then use the experiment number to separate out the types of scans and plot them individually

ggplot(ms1_w_expnum[mz%between%pmppm(371.09458, 10)][int>0]) +
  geom_line(aes(x=rt, y=int)) +
  facet_wrap(~exp_num, ncol=1, scales = "free_y")

wkumler / RaMS

feature request: read different types of scan #20