meowcat / MSnio

3 stars 3 forks source link

Add SpectrumParser virtual class #5

Open jorainer opened 5 years ago

jorainer commented 5 years ago

This is a small proposal to help unifying the API for the spectrum parsers. The idea is to have a virtual class SpectrumParser with a defined set of methods. Parser implementations should extend this class and implement all of its methods.

We would have to agree also on a common set of methods and its input and result arguments. I've added one method so far, but more (or different ones) might be helpful.

If this method or the return type makes sense needs to be discussed further. I just wanted to propose one way how a single API for the user could be defined that uses different parsers to read data from different sources.

An example workflow to read data from a single file would be the following (based on the toy MassbankParser object extending SpectrumParser I've implemented):

library(MSnio)
## Create and intialize a parser; this should read/import the corresponding schema
prs <- MassbankParser()
## Spectrum file
fl <- system.file("spectra/massbank/EA016614.txt", package="MSnio")
res <- importSpectrum(prs, fl)

## res should then ideally be a `Spectra` object - or the new `Spectrum` object.

Let's discuss now.

jorainer commented 5 years ago

Just realized I can't request any reviewers - so please @meowcat , @michaelwitting @Treutler , have a look at it and let's discuss.

meowcat commented 5 years ago

Thanks for the input. It looks good already, but do we want a SpectrumParser or do we want a more general FormatHandler (or whatever) which has both import (parser) and export (writer)?

meowcat commented 5 years ago

We can of course have them separate, but ideally for any format the Parser and the Writer implementation would successfully round-trip (and we could use this as a unit test) and ideally also use the same schema definitions...

jorainer commented 5 years ago

Good point! I just don't like the name yet. What about SpectrumRecord and e.g. MassbankRecord?

meowcat commented 5 years ago

I also don't like the name :) But Record is also so-so because it describes the entity of a record more than the facility to handle it.

By the way, for this class I suggest we also specify something like the "subset of fields it can successfully round-trip", because not all formats will be able to handle everything lossless for import and export (even though we would like it). The validation / unittest procedure would then possibly be:

jorainer commented 5 years ago

An alternative would be to set all the not-supported fields to NA. That's at least what we have e.g. in mzR as not all input files provide all spectrum metadata.

meowcat commented 5 years ago

A field can be NA if there is no data in the file, regardless of whether the file format actually supports the field or not. E.g. for MassBank records there is a vast amount of optional fields. I think specifying the supported fields per file format will be important when it comes to reproducibility/testing.

jorainer commented 5 years ago

Agree. And that's exactly the point. I will try to define a schema for hmdb an importer, unit tests etc and then make a pull request (or actually change this one).

jorainer commented 5 years ago

Name: SpectrumAccessor, MassbankAccessor???

Treutler commented 5 years ago

Our FormatAdapter (or whatever) is similar to a BridgeDb IDMapper. A FormatAdapter could have

The FormatAdapter would than do the import and export and map the fields from file to the fields from the general format and vice versa.

meowcat commented 5 years ago

Other name ideas: SpectrumTranslator SpectrumConverter But @jorainer for now you can use what you like best, changing the name will be the smallest problem if we don't like it.