Open Don86 opened 3 years ago
There's the MsBackendHdf5Peaks
backend that stores the m/z and intensities on-disk in custom hdf5 data files. The spectra variables are still stored and manipulated in memory (in a DataFrame
).
When you say HDF5 seems like a better storage option, I assume to refer to mzML. Even though you aren't wrong, mzML (a specific XML-based implementation for MS data that is widely adopted) and HDF5 (a general data storage system) are hardly directly comparable.
By the way, I'm transferring this issue from the RforMassSpectrometry.org repo to the Spectra
package, which is where the backend class and interface is defined.
Happy to find this issue still opened! Would be great indeed 😊
Note that there are different backends already available that support export in a variety of formats. You could import a mzML and export that as an MGF file using the MsBackendMgf backend - but that might not be efficient. As an alternative possibility you could store the MS data from an mzML file into a SQL database (SQLite or MySQL) using the MsBackendSql - but again, that's no standard format - it's the format we define. But you could read/import that data from the SQLite or MySQL database also from python et al.
I saw them, and they are great for so many cases!
My (probably relatively seldom) use case is matching (few) spectra against a (HUGE) spectral library, which stays fix for very long. My feeling is that loading with an MGF backend takes ages, while loading with a DB backend indeed faster, but still far from hd5.
We faced this issue of 99% of the time taken by loading of the spectra (not the matching) in our https://github.com/mandelbrot-project/spectral_lib_matcher#using-binary-libraries, reason why we implemented binary libraries.
@Adafede , if you have a huge reference spectral library, you might consider storing that into a CompDb
database (from the CompoundDb package). That package provides also a Spectra
backend retrieving the data directly from the database. That should be faster then using an MGF backend.
Hi,
I'd like to ask if there's currently a way to write out a
Spectra
S4 object, probably initially read as.mzML
or.mzXML
, as.h5
? There doesn't seem to be this capability from what I'm seen in the manual. HDF5 seems like a better storage option since it has a smaller file size, well-supported outside of the mass spec world, and easily-interoperable with Python as well.Regards, Don