Open michabirklbauer opened 7 months ago
Could implement a common spectrum datastructure with things that we need
Doesn't pyteomics
provide a relatively unified interface to both MGF and mzML (and other common formats)? At least to the spectral data. I mean the overhead should not be very large.
Regarding the extension problem, one can relatively easily detect major formats by reading, say, the first 100 bytes of the file itself - i.e. mzML file should start with <?xml version="1.0" encoding="utf-8"?><(indexed)mzML
, MGF should have BEGIN IONS
somewhere close to the top, etc. Alternatively, it is possible to chain-feed the file to all parsers, and the first one that won't complain is the right one. Finally, the format can be submitted by the user, although it adds an extra "burden" for him/her.