package design suggestion

jorainer commented 6 years ago

@stanstrup, very nice work indeed! Always wanted to have such a packages and also started implementing something (https://github.com/jotsetung/xcmsExtensions), but never too serious.

My suggestion(s):

Keep functionality separate from the data: Have dedicated data packages. This allows to have data packages from different sources or from different versions. See e.g. ensembldb and the EnsDb.Hsapiens.v75 package, or GenomicFeatures and the separate TxDb packages.

Define a CompoundDb class with main methods to query and access the database. E.g. have a method compound that retrieves compounds from the database and supports multiple filters. I know that's a little different setup (Bioconductor's rich, S4-based) than the tidyverse one, still, I think one could combine both worlds.

One could then define e.g. a HMDB class that simply extends the CompoundDb to accommodate HMDB-specific fields and attributes.

What might also be interesting is to implement the filters (or create filters that extend) AnnotationFilter (e.g. MzFilter or MassFilter). The MzFilter would have to calculate the mass for the provided mz. Ideal would be to have a MassFilter that takes a MzFilter as input, calculated the theoretical mass for the mz and returns a MassFilter.

The big advantages of having this setup would be:

Versioned data packages.
Data packages could be added to AnnotationHub.
Using the same interface (methods and filters) for different data resources simplifies the use for the user (e.g. use the same method to retrieve
Integration into Bioconductor. Common concept for annotation resources.
You don't have to worry about licensing of the data resource in the main package. Each data package could/should have its own version. For those that don't allow sharing the data you could just provide the functionality to create the resource in the package and let it to the user to create the package for themselfs (if they have the license to do so).

I would be happy to contribute here (especially related to the database class, interface methods and filters as I did all this already in ensembldb).

open for discussion