Open jorainer opened 5 years ago
Just realized I can't request any reviewers - so please @meowcat , @michaelwitting @Treutler , have a look at it and let's discuss.
Thanks for the input. It looks good already, but do we want a SpectrumParser
or do we want a more general FormatHandler
(or whatever) which has both import (parser) and export (writer)?
We can of course have them separate, but ideally for any format the Parser and the Writer implementation would successfully round-trip (and we could use this as a unit test) and ideally also use the same schema definitions...
Good point! I just don't like the name yet. What about SpectrumRecord
and e.g. MassbankRecord
?
I also don't like the name :) But Record
is also so-so because it describes the entity of a record more than the facility to handle it.
By the way, for this class I suggest we also specify something like the "subset of fields it can successfully round-trip", because not all formats will be able to handle everything lossless for import and export (even though we would like it). The validation / unittest procedure would then possibly be:
An alternative would be to set all the not-supported fields to NA
. That's at least what we have e.g. in mzR
as not all input files provide all spectrum metadata.
A field can be NA if there is no data in the file, regardless of whether the file format actually supports the field or not. E.g. for MassBank records there is a vast amount of optional fields. I think specifying the supported fields per file format will be important when it comes to reproducibility/testing.
Agree. And that's exactly the point. I will try to define a schema for hmdb an importer, unit tests etc and then make a pull request (or actually change this one).
Name: SpectrumAccessor
, MassbankAccessor
???
Our FormatAdapter
(or whatever) is similar to a BridgeDb IDMapper. A FormatAdapter
could have
The FormatAdapter
would than do the import and export and map the fields from file to the fields from the general format and vice versa.
Other name ideas:
SpectrumTranslator
SpectrumConverter
But @jorainer for now you can use what you like best, changing the name will be the smallest problem if we don't like it.
This is a small proposal to help unifying the API for the spectrum parsers. The idea is to have a virtual class
SpectrumParser
with a defined set of methods. Parser implementations should extend this class and implement all of its methods.We would have to agree also on a common set of methods and its input and result arguments. I've added one method so far, but more (or different ones) might be helpful.
importSpectrum
takes aSpectrumParser
object and a file, reads its content, maps its properties to the standard fields (using the schema) and returns aSpectra
object.If this method or the return type makes sense needs to be discussed further. I just wanted to propose one way how a single API for the user could be defined that uses different parsers to read data from different sources.
An example workflow to read data from a single file would be the following (based on the toy
MassbankParser
object extendingSpectrumParser
I've implemented):Let's discuss now.