stanstrup / PeakABro

Peaklist Annotator and Browser
8 stars 1 forks source link

package design suggestion #6

Open jorainer opened 6 years ago

jorainer commented 6 years ago

@stanstrup, very nice work indeed! Always wanted to have such a packages and also started implementing something (https://github.com/jotsetung/xcmsExtensions), but never too serious.

My suggestion(s):

Keep functionality separate from the data: Have dedicated data packages. This allows to have data packages from different sources or from different versions. See e.g. ensembldb and the EnsDb.Hsapiens.v75 package, or GenomicFeatures and the separate TxDb packages.

Define a CompoundDb class with main methods to query and access the database. E.g. have a method compound that retrieves compounds from the database and supports multiple filters. I know that's a little different setup (Bioconductor's rich, S4-based) than the tidyverse one, still, I think one could combine both worlds.

One could then define e.g. a HMDB class that simply extends the CompoundDb to accommodate HMDB-specific fields and attributes.

What might also be interesting is to implement the filters (or create filters that extend) AnnotationFilter (e.g. MzFilter or MassFilter). The MzFilter would have to calculate the mass for the provided mz. Ideal would be to have a MassFilter that takes a MzFilter as input, calculated the theoretical mass for the mz and returns a MassFilter.

The big advantages of having this setup would be:

I would be happy to contribute here (especially related to the database class, interface methods and filters as I did all this already in ensembldb).

open for discussion

stanstrup commented 6 years ago

Ad data package: Yes I think you are definitely right that this is the way to go. I will do that when I have parsed HMDB (need to ask if I can be allowed to include it) and pubchem.

Ad the the S4 stuff. I am not against that at all. But I think I have to say that that is not priority for me as I have not yet dwelled into the whole S4-thing yet. So right now it will be some investment of time to learn that stuff and less easy for me to maintain. I would also not want to drop the tidyverse style code as I find this so much easier to work with. But I guess that can be done as you said. That said if you want to start on this I would be grateful and more than happy to change the structure. It is something I really want but I also need to priorities actually using the tool to do the projects I am supposed to be working on...

jorainer commented 6 years ago

If it's OK for you I would fork your repo and add the code related to the S4 approach - it's easier to describe/discuss if you can really get your hands onto it.

stanstrup commented 6 years ago

Yes of course. I think it would be less messy if you hold on for now though. I am adding the last things I have. The peaklist annotation and the browser itself.

stanstrup commented 6 years ago

@jotsetung I added the last functions and added full example to the README. I don't plan to do much more right now. Perhaps get Pubchem and HMDB parsers done.

jorainer commented 6 years ago

Cool! thanks! Could you please make a branch named e.g. S4? So, if I make a pull request at some time we don't mess with the master branch?

stanstrup commented 6 years ago

Sure! Done.

rsalek commented 6 years ago

I need to catchup, nice work so quickly put together!

stanstrup commented 6 years ago

I am starting to think that it might make sense to lock it down to defining an S4 object that takes a CAMERA object and adds the annotation (or extends the CAMERA object to allow this). That way all the peaktable manipulation can be abstracted away. Downside is that you can then basically only use xcms/CAMERA (though I do have a function to fake a CAMERA object from an MzMine peaklist).

jorainer commented 6 years ago

Supporting to perform the annotation on a CAMERA object makes sense, but I would definitely keep low-level functions that don't required special S4 objects as input, so that it can be applied in general too.