Closed wkumler closed 8 months ago
As a note on timing - I ran into a question today where I wanted to pull out 3 masses from 24 files and wondered whether it would be faster to load them using the mzMLs and grab_what = "eic"
or use the tmzMLs and open each file 3x. As it turns out, the two methods are almost equivalent with the tmzMLs faster up to 3 masses and the mzMLs expected to be faster with 4+.
adduct_mzs <- c(76.0763, 151.144, 401.125)
system.time({
tmsdata <- grabMSdata(list.files("../tmzMLs/pos", pattern="190715_Smp", full.names = TRUE))
adduct_eic <- adduct_mzs %>%
map(function(adduct_mz_i)tmsdata$MS1[mz%between%pmppm(adduct_mz_i, 10)]) %>%
bind_rows()
})
user system elapsed
2.28 1.21 28.98
system.time({
new_msdata <- grabMSdata(list.files("../mzMLs/pos", pattern="190715_Smp", full.names = TRUE),
grab_what="EIC", mz=adduct_mzs, ppm=10)
})
Total time: 34.15 secs
user system elapsed
24.37 1.47 34.14
With the shift towards arrow
and other structured databases, I'm going to flag this as unnecessary now and close the issue.
I wonder if it would be possible to add an option to
grabMSdata
that constructs the tmzMLs on the fly rather than making it a separate step. Instead of reading the mzMLs directly into memory, the option (as_tmzML = TRUE
?) would instead convert the files to tmzML in a temporary directory then construct and return the tmzML object. Given the (intentional) similarities between the two types I wonder if it's possible to streamline this because I often find myself held back by the initial tmzML construction step and end up spending more time waiting for the files to load repeatedly. This could also be enabled as an option if memory limits are approached - if the total size of the files to be loaded exceeds, say, a quarter of the system's RAM, it could throw a warning and suggest usingas_tmzML = TRUE
.Expected issues include:
on.exit
seems like it requires an active function but maybe there's an equivalent for when the R session ends overall? How to handle R crashing? This could be a major issue because mass spec files are big and could easily clog up a user's system if not cleared out regularly.as_tmzML
folder is requested?glimpse
functionality...