rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
34 stars 24 forks source link

Ion Mobility #142

Open ococrook opened 3 years ago

ococrook commented 3 years ago

Very useful project guys! Sorry if the below can already be found and I missed them.

Currently the core variables are msLevel, rtime, scanIndex. Ion mobility can be accessed via

spectraData(myspectra, columns = c("ionMobilityDriftTime"))

Would it be possible to be able to optionally upgrade ion mobility to a core variable and add easy access? Something like

imTime(myspectra) # intended output numeric vector with ion mobility times

and a respective filter

filterIm(myspectra, ionMob = c(3,4)) # intended ouput returns a spectra object with just those ion mobilities

maybe a different package, but the following would also amazing

itensity2d(myspectra, rt = c(175, 189), ionMob = c(3,4)) # intended output, plots a heatmap/image of (binned) intensities in rt and ion mobility coordinates 

Would happily implement and send a pull request if you could point me to examples.

jorainer commented 3 years ago

I am a little reluctant adding ionMobilityDriftTime as a core spectra variable - since only few (one?) instruments currently record/provide that. Access should however be possible also with myspectra$ionMobilityDriftTime instead of spectraData(myspectra, columns = ...).

Adding a filterIonMobilityDriftTime method for Spectra would be surely possible - but I would add that only to the Spectra object and not to the backend.

Also for the plotting function, that sounds like it's more something specific to ion mobility. I'm wondering if it would not make sense to put that into an IonMobility package that extends Spectra?

But that's just my first thoughts - happy to discuss that.

ococrook commented 3 years ago

Fair to leave it out of core variables. I think there are 5 waters' instruments with IMS and 3 Thermo instruments with IMS.

A filter method would be great - agreed no need to add to backend.

Maybe there is some value in a seperate ion mobility package for QC and visualisation specific to that data?

jorainer commented 3 years ago

Maybe there is some value in a seperate ion mobility package for QC and visualisation specific to that data?

I think so - I have no experience with ion mobility data, but having a package with a vignette and some use cases would be perfect. Currently we're trying to not pack to much functionality into each RforMassSpec package - we want to keep them small, lightweight (easier to maintain) and modular.

ococrook commented 3 years ago

Cool, maybe I should add chat with @lgatto about an ion mobility package and find some simple use cases.

lgatto commented 3 years ago

I agree that a dedicated IonMobility package would probably be a good start. We could always port some functionality that appears to be of general interest back into Spectra, if necessary.

I am happy to help out. The first thing to do would be to have a small test data (ideally that fits in memory) but useful enough for development and demonstration.

I started to set up an msdata2 package that, like msdata distributes MS data, but through ExperimentHub. That would be a good place to put that (and other) data. Let me know if I need to put that up the priority list.

ococrook commented 3 years ago

Cool, @lgatto and @jorainer , I'll see if I can get hold of some ion Mobility data that I can share.

Would you prefer is as an .mzML file or a spectra object

lgatto commented 3 years ago

At this stage, I think one of two mzML files would most appropriate. Feel free to test and play with Spectra, of course. Then, we could craft a smallish Spectra object with slices of these two files.

jorainer commented 3 years ago

I agree with @lgatto - would be ideal to have two mzML files. We could always filter them to a smaller m/z and/or rt range to keep them small which allows also for a faster data processing.

ococrook commented 3 years ago

hi guys, starting to think about ion mobility data again. I've got 89 or so IM datasets, for roughly half the IM dimension has been removed for benchmarking purposes (to show IM really does improve SNR).

Each .mzML file is about 4GB which is quite large and to give you an idea for every RT there are ~200 ion mobility times to give you an idea of scaling.

The data can easily be parsed into a Spectra object and generates a column called IonMobilityDriftTime, automatically.

I think a lot can be done with a simplefilterIonm function, but there could be more functionality in the future. I think I want to go back on what I said about backends. These data are likely to generate millions of spectra very easily and being able to extract slices of ion mobility times from the on-disk backend would be very useful.

There is also some suggestion that groups are starting to remove the LC dimension all together and just use IMS or use very very short gradients. example: https://www.nature.com/articles/s41592-020-00999-z

Happy to discuss and write the code as needed, but would be grateful for examples, feedback and direction.

ococrook commented 3 years ago

@jorainer I don't know exactly how you'd like to implement this, but what are your thoughts on something like the following:

Easy access to the ion mobility columns

## access ion mobilty columns
imtime <- function(object){

    Spectra:::.get_column(spectraData(object), "ionMobilityDriftTime")
}

# methods
setMethod("imtime", "Spectra", function(object) {
    Spectra:::.get_column(spectraData(object), "ionMobilityDriftTime")
})

setMethod("imtime", "MsBackendDataFrame", function(object) {
    Spectra:::.get_column(spectraData(object), "ionMobilityDriftTime")
})

Filtering the ion mobility columns

#filter ion mobility
.filterIonMobility <- function(x,
                              imtime = numeric(),
                              msLevel = integer()){
    x[which(MsCoreUtils::between(spectraData(x)[, "ionMobilityDriftTime"], imtime)), , drop = FALSE]
}

# generic
setGeneric("filterIonMobility", function(object, ...)
    standardGeneric("filterIonMobility"))

# method
setMethod("filterIonMobility", "Spectra",
          function(object, 
                   imtime = c(0, Inf),
                   msLevel. = unique(msLevel(object))) {
              if (!Spectra:::.check_ms_level(object, msLevel.))
                  return(object)
              if (is.numeric(imtime)) {
                  if (length(imtime) == 1)
                      imtime <- c(imtime, Inf)
                  if (length(imtime) != 2)
                      stop("'imtime' should be of length specifying a ",
                           "lower ion mobility limit or of length two defining ",
                           "a lower and upper limit.")
                  object <- .filterIonMobility(object,
                                               imtime = imtime,
                                               msLevel = msLevel.)
                  object@processing <- Spectra:::.logging(
                      object@processing, "Filter: select ion mobility within ",
                      "[", imtime[1], ", ", imtime[2],
                      "] in spectra of MS level(s) ",
                      paste0(msLevel., collapse = ", "), ".")
              } else stop("'imtime' has to be numeric")
              object
          })`
ococrook commented 3 years ago

(sorry for the calls to none exported functions)

jorainer commented 3 years ago

Really sorry for my late reply. I have some suggestions for your code above:

setMethod("ionMobility", "Spectra", function(object, ...) {
    do.call("$", list(object@backend, "ionMobilityDriftTime"))
})
setReplaceMethod("ionMobility", "Spectra", function(object, value) {
    do.call("$<-", list(object@backend, "ionMobilityDriftTime", value))
})

This should be faster and more lightweight than calling first spectraData on the Spectra (depending on the backend that would require to load the full spectra data). If you use the $ on the backend, depending on its implementation in the backend, only that one column would be retrieved.

I'm however not sure about the name of the function. imtime is short, but it would be difficult for a new user to find that function if he/she is using for a function to access the ion mobility drift time. I guess, the correct name would be ionMobilityDriftTime, but that rather long. So I would call it ionMobility unless the drift time is really important and ion mobility (without the time) has a different meaning. But that's something I don't know - maybe you can have a chat with your MS people to get some feedback how to best name this function?

The filterIonMobility looks reasonable to me. I would then however also use the ionMobility method from above instead of spectraData(object)[, "ionMobilityDriftTime"]. I would then maybe also call imtime <- range(imtime) in the if (is.numeric(imtime)) condition as you could then drop some of the if statements.

ococrook commented 3 years ago

Great, thanks for the feedback Johannes. Will implement the above suggestion and find out the best naming. This will sit in my IonMobility package for moment.

Cheers!

ococrook commented 3 years ago

@jorainer Just in case your interested other have recommend IonMobilityTime, there are two reasons for this. FIrst, some convert these units to cross-sectional collisions in Native MS so specifying time unit is useful. Second, in Bruker instruments its not a Drift but a Trap so the drift is non-specific.

jorainer commented 3 years ago

Excellent! Thanks for the feedback!

MuyaoXi9271 commented 9 months ago

@jorainer Just in case your interested other have recommend IonMobilityTime, there are two reasons for this. FIrst, some convert these units to cross-sectional collisions in Native MS so specifying time unit is useful. Second, in Bruker instruments its not a Drift but a Trap so the drift is non-specific.

Hi,

May I ask how to get the trap information from the .mzML file acquired from Bruker instrument. Thanks very much.

Best, Muyao