wkumler / RaMS

R-based access to Mass-Spectrometry data
Other
20 stars 7 forks source link

Constructing tmzMLs on the fly #16

Closed wkumler closed 8 months ago

wkumler commented 1 year ago

I wonder if it would be possible to add an option to grabMSdata that constructs the tmzMLs on the fly rather than making it a separate step. Instead of reading the mzMLs directly into memory, the option (as_tmzML = TRUE?) would instead convert the files to tmzML in a temporary directory then construct and return the tmzML object. Given the (intentional) similarities between the two types I wonder if it's possible to streamline this because I often find myself held back by the initial tmzML construction step and end up spending more time waiting for the files to load repeatedly. This could also be enabled as an option if memory limits are approached - if the total size of the files to be loaded exceeds, say, a quarter of the system's RAM, it could throw a warning and suggest using as_tmzML = TRUE.

Expected issues include:

wkumler commented 11 months ago

As a note on timing - I ran into a question today where I wanted to pull out 3 masses from 24 files and wondered whether it would be faster to load them using the mzMLs and grab_what = "eic" or use the tmzMLs and open each file 3x. As it turns out, the two methods are almost equivalent with the tmzMLs faster up to 3 masses and the mzMLs expected to be faster with 4+.

adduct_mzs <- c(76.0763, 151.144, 401.125)
system.time({
  tmsdata <- grabMSdata(list.files("../tmzMLs/pos", pattern="190715_Smp", full.names = TRUE))
  adduct_eic <- adduct_mzs %>%
    map(function(adduct_mz_i)tmsdata$MS1[mz%between%pmppm(adduct_mz_i, 10)]) %>%
    bind_rows()
})
   user  system elapsed 
   2.28    1.21   28.98 
system.time({
  new_msdata <- grabMSdata(list.files("../mzMLs/pos", pattern="190715_Smp", full.names = TRUE),
                           grab_what="EIC", mz=adduct_mzs, ppm=10)
})
Total time: 34.15 secs 
   user  system elapsed 
  24.37    1.47   34.14
wkumler commented 8 months ago

With the shift towards arrow and other structured databases, I'm going to flag this as unnecessary now and close the issue.