rformassspectrometry / Spectra

Low level infrastructure to handle MS spectra
https://rformassspectrometry.github.io/Spectra/
38 stars 25 forks source link

Coerce Spectra to DataFrame #325

Closed lgatto closed 4 months ago

lgatto commented 5 months ago

I would like to add the following to the package:

asDataFrame <- function(object, i = seq_along(object),
                        spectraVars = spectraVariables(object)) {
    object <- object[i]
    n <- sapply(peaksData(object), nrow)
    v <- spectraData(object)[rep(seq_along(object), n), spectraVars]
    p <- do.call(rbind, peaksData(object))
    cbind(p, v)
}
> library(Spectra)
> sciex <- Spectra(sciex_file, backend = MsBackendMzR())
> asDataFrame(sciex, i = 1:3, spectraVariables(sciex)[1:3])
DataFrame with 3707 rows and 5 columns
            mz intensity   msLevel     rtime acquisitionNum
     <numeric> <numeric> <integer> <numeric>      <integer>
1      105.043         0         1      0.28              1
2      105.045       412         1      0.28              1
3      105.046         0         1      0.28              1
4      107.055         0         1      0.28              1
5      107.057       412         1      0.28              1
...        ...       ...       ...       ...            ...
3703   133.984         0         1     0.838              3
3704   133.985       132         1     0.838              3
3705   133.987         0         1     0.838              3
3706   133.989       132         1     0.838              3
3707   133.990         0         1     0.838              3

The goal is of course not to replace Spectra objects with long tables. It might be a useful intermediate data structure for plotting, and a useful interface with the tidyomics initiative.

jorainer commented 4 months ago

sounds like a good idea - but what would be the advantage of DataFrame over data.frame?

lgatto commented 4 months ago

By default, it returns a DataFrame because of the spectraData, but a data.frame is fine by me, and probably more relevant here, it the idea is to transition to a tidyverse/dplyr/ggplot workflow.

jorainer commented 4 months ago

I'm fine with either DataFrame or data.frame - was just curious.