lgatto / MSnbase

Base Classes and Functions for Mass Spectrometry and Proteomics
http://lgatto.github.io/MSnbase/
123 stars 50 forks source link

Why readMSData function was so slow when reading mzML file? #580

Closed HongxiangXu closed 1 year ago

HongxiangXu commented 1 year ago

I have successfully run example of mzML from MSnbase packages through readMSData in just 1 second. I saw this example file was very small (0.18Mb).

However it takes more than 30min to read my mzML file (around 900Mb). When I use other packages to read in my mzML it was also quick, but certainly not meet the need of formation of MSnbase to do further analysis such as quantification. quantFile <- list.files("ccms_peak", pattern="mzML",full.names=TRUE, recursive = TRUE) msexp <- readMSData(quantFile[1], verbose = FALSE)

My R server had 48 thread and more than 800GB RAM. How could I accelerate this function?

lgatto commented 1 year ago

RAM and CPUs aren't the limiting factor when reading data from disk - disk access is.

Please also read the section about in-memory and on-disk backends, where the former has a major impact on RAM requirements.

Finally, please do consider using Spectra for all you raw data manipulation needs. More on the R for Mass Spectrometry initiative (which Spectra is part of) here - https://rformassspectrometry.github.io/docs/